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Abstract 



Most of the worlds currently accessible hydrocarbon resources are found in tight rocks - rocks 
with permeabilities in the millidarcy range and with pore sizes in the nanometer range. Tight 
rocks pose new scientific problems because of the small length-scales involved. Traditional oil 
plays are found in, for example, sand stone reservoirs with millimeter to micrometer sized pores. 
For such systems, standard hydrodynamics is a sufficient tool to understand, describe and predict 
fluid transport. In tight rocks, however, typical pore sizes are in the range of tens to hundreds of 
nanometers. At this scale, the mean free path - the average distance a particle moves between 
collisions - may become comparable to the characteristic sizes of the porous medium, and the 
standard assumption that the fluid can be described as a continuum no longer holds. The mean 
free path of dilute gases are often tens of nanometers, or higher. 

The gas from tight rocks, like shales, is stored inside closed pore networks or adsorbed onto 
organic matter. In order to extract adequate levels of gas from such rocks, we generate fractures 
to increase the permeability of the rock. The gas flows from small pore networks with diameters 
down to a few nanometers. The rate at which the gas flows through such networks is proportional 
to the permeability of the material - a result known as Darcy's law. Dilute gases in nanoporous 
media have a non-zero slip velocity which can cause an increase of permeability of a factor 50 
compared to what continuum theory predicts. This is an effect known as the Klinkenberg effect. 
In order to be able to increase gas production rates in a safe way, we need to understand the 
physics at this scale. This requires models that are valid at these length scales. 

In this thesis, we implement two numerical particle models. The first is called Molecular Dy- 
namics. It describes the motion of atoms by using parameterized potentials to compute forces 
between them. The second model, Direct Simulation Monte Carlo, uses the principles of statis- 
tical mechanics allowing larger systems to be studied. Both implementations support arbitrary 
geometries, and show promising scaling efficiency on massive parallel machines. We use these 
models to study the Klinkenberg effect and confirm that the Knudsen permeability correction 
correctly predicts the permeability for systems with pore sizes down to a few nanometers. We 
also analyze more complicated geometries, and argue that a stochastic version of the Knudsen 
correction is needed for geometries without a well defined Knudsen number. 

Highly efficient custom 3D visualization tools are developed using modern OpenGL techniques 
such as instanced geometry shaders, billboards and the marching cubes algorithm. Systems 
with tens of millions of particles can be rendered realtime with decent frame rate, allowing us 
to study larger systems than what can be done with already existing free software. All software 
developed during this thesis can serve as tools for further study in the field. 
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pores, and the gas gathers in the drilling hole through the network of larger frac- 
ture. From the drilling hole, the gas flows to the surface. Illustration: Sigve B0e 
Skattum 14 

2.1 An example of a porous medium. Here we see a metal foam - a solid metal with 
pore space available for a fluid. A nanoporous medium is a porous medium where 
the size of the pores are at the nanometer scale. Image from http:/ /en.wikipedia. 
org/wiki/File:Metal_Foam_in_Scanning_Electron_Microscope,_magnification_ 
10x.GIF, accessed 28 March, 2014 22 

2.2 A box with volume V = LA with fixed pressure values at x = 0 and x = L. The 
volumetric flow rate Q through a cross sectional area A is given by Darcy's law 

in equation (2.7) 23 

2.3 Slip length is the distance into the wall we would have to extrapolate a velocity 
profile for it to reach zero value. We have the no-slip condition on the left, where 

the slip length is zero whereas we have a non-zero slip length on the right 26 



2.4 The four flow regimes covering the important regions in the Knudsen number 
range where different flow types appear. In the low Knudsen number limit, the 
fluid can be assumed to be a continuum and we can use equations like the Euler 
equation or the NSE. For larger Knudsen numbers, the no-slip boundary condition 
is invalid and we need a model satisfying slip velocity. We reach the transition 
flow regime at Kn~ 0.1 where the continuum models do no longer hold, even 
with slip boundary conditions. In the high Knudsen regime particle collisions are 
so rare that it is classified as free molecular flow. In this range, the collisionless 
Boltzmann equation is valid (see section 3.4) 28 
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5.3 The collision detection algorithm. A particle first moves the full timestep At 
(step 1) from an empty voxel to an inner wall voxel. It is then moved back half a 
timestep At/2 (step 2) to see whether or not it is still in the wall. In step 3 we have 
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step 5 we choose a new velocity based on the normal and tangent vectors that 
boundary voxel has. Then we continue the timestep. This whole process may 



happen several times during one timestep 73 

5.4 Illustration of how the spatial domain can be divided into four sub domains, each 
controlled by a processor. Each processor contains many particles that are placed 
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5.5 Processor labeling in a 3-dimensional grid. Each processor is uniquely identified 
through its coordinate (p x ,Py,Pz) 78 

5.6 The middle node (1,1) has 8 neighbors it needs to communicate with. Each node 
only needs to communicate with its nearest neighbors (4 in two dimensions, 6 
in three dimensions), because the nearest neighbors can work as intermediate 
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6.1 The system clearly reaches the Maxwell-Boltzmann distribution when the initial 
velocity distribution follows equation 6.1. Here the final temperature was mea- 
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8.4 The middle processor with coordinates (1,1) in a two-dimensional system receives 
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11.1 Benchmark showing the number of frames per second (FPS) the billboard class is 
able to render with the number of billboard spheres from a few thousand to more 
than one billion visible spheres. The benchmark is performed on an NVIDIA 
GTX Titan 143 
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Introduction 



In the later years, the field of computational science has been merged with theoretical physics 
forming a new field; computational physics. Not only is the physics important, but numerical 
models, algorithms and code optimizing are important parts of the daily work. A computational 
physicist often spends most of the time on writing code - a task that can be both frustrating and 
enjoyable at the same time. The code should do as intended in addition to being both readable 
and efficient. 

A master student starting working on a thesis in computational physics must at some point 
choose a focus area somewhere in between computer science and physics. The author finds a 
great pleasure in both fields, but has in the work in this thesis spent most of the time doing 
code development. But of course, the underlying questions are physical questions whereas the 
tool we use to answer them is computer science. 

In this chapter we first present the project description - written by Prof. Anders Malthe- 
S0renssen - in section 1.1. It motivates for the need of having models of gas flow at the nanometer 
scale. We then briefly explain the process of shale gas extraction in section 1.2 where we introduce 
the origin of the physical questions the work of this thesis can be used to answer. In section 1.3 
the main goals of the work are presented before we in section 1 .4 highlight the parts of the work 
that are new and original, created by the author. We conclude the chapter with section 1.5 that 
is a brief overview of the structure of the whole thesis. 



1.1 Project description 



Most of the worlds currently accessible hydrocarbon resources are found in tight rocks - rocks 
with permeabilities in the millidarcy range and with pore sizes in the nanometer range. The 
development of technologies for production from tight rocks have changed the energy landscape, 
making countries such as US self-sufficient with gas and possibly also with oil, and the esti- 
mates from the producible reserves of hydrocarbons in tight rocks are continuously increased 
as new methods are developed and new plays are discovered. However, we are now at a stage 
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where technological and engineering methods have surpassed our basic scientific understanding 
of production from tight rock systems. 

Tight rocks pose new scientific problems because of the small length-scales involved. Traditional 
oil plays are found in for example sand stone reservoirs with millimeter to micrometer sized 
pores. For such systems, standard hydrodynamics is a sufficient tool to understand, describe 
and predict fluid transport, even for multiphase systems. However, in tight rocks, typical pore 
sizes are in the range of tens to hundreds of nanometers. For such systems, the finite size of 
the atoms and molecules that make out fluids and surfaces become important: The dielectric 
properties of water and surface charge distributions, the binding energies of the fluids to the 
surfaces, and the effects of surface shapes and irregularities on effective surface interactions 
become important. For example, the usual assumption in fluid mechanics of no-slip boundary 
conditions may no longer hold, the fluids may behave differently close to surfaces than in bulk, 
and for smaller pores the surface to volume ratio is larger than for larger systems, and for gases 
the mean free path may become comparable to the characteristic sizes of the porous medium. 
These effects introduce challenges in how to describe and model fluid flow and surface reactions 
in tight rocks. 

We have initiated an activity in tight rocks to address tight-rocks-specific effects for enhanced 
hydrocarbon production and C02 storage. A part of that initiative requires the development 
of better models to address fluid flow, both liquids and gases, in tight rocks geometries with a 
particular focus on shale systems. In this project, we will address fluid flow in tight rocks sys- 
tems by developing models to address atomistic effects for dilute gases and water in hydrophilic 
systems. To do this we need different models spanning various length scales. To address the 
flow of dilute gases in complex geometries on nanometers to micrometer length scales we will 
develop a method called Direct Simulation Monte Carlo (DSMC), that models a gas through ef- 
fective particles that collide with other gas particles with stochastic collision rules that conserves 
momentum and that interacts with surfaces through special reflection rules that can be tuned 
using for example theoretical, experimental or atomic scale modeling results. Such models have 
proven useful to address dilute gas flows in regular geometries, such as for tube and channel 
flows, but we need very general tools to address the complex geometries of tight rocks system as 
found in experimental, tomographical studies. To supplement the modeling of dilute gases, we 
will also need methods to address the dynamics and flow of water in small pores using atomic 
scale models. In that case we need to model specific materials, and we have access to a very 
good molecular dynamics (MD) model for the interaction between water and silicates that we 
plan to develop and use to address fluid flow in nanoscale geometries. In this project we will 
develop both a DSMC and a MD model to address gas and liquid flow in tight rocks system. 



1.2 Shale gas extraction 



Shale gas is a type of natural gas trapped in shales - tight rocks with pore sizes at the nanometer 
scale. In these reservoirs, the gas is stored inside already existing pore networks or adsorbed 
onto organic matter. In order to extract adequate levels of gas from such rocks, we generate 
fractures to increase the permeability of the rock. Today we usually use horizontal drilling; we 
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drill down from the earth surface until we reach the shale formation, there we continue drilling 
horizontally inside the shale (see figure 1.1). 

The fractures are generated by hydraulic fracturing, a process where millions of liters of water 
are injected under high pressure, cracking the shales. Proppants - small, usually spherical, 
ceramic particles - are mixed with water, keeping the fractures open even after the pressure 
is released. The gas then flows from the nanosized pores into the fracture network through 
very small channels. Even after the pressure is released, the pressure inside the shale formation 
remains higher than the surface pressure. This makes the gas flow through the drilling hole to 
the earth surface. The process of shale gas extraction couples physics at length scales spanning 
over 10 orders of magnitude. In this thesis, we focus on the smallest scale: the scales at which 
the gas is produced. We will study flow of dilute gases in nanoporous media. 
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Figure 1.1: Shale gas extraction principles. Hydraulic fracturing cracks the shales, releasing the 
trapped gas. In this process, physical phenomena on length scales from nanometers to meters 
may be relevant. The production occurs in nanometer pores, and the gas gathers in the drilling 
hole through the network of larger fracture. From the drilling hole, the gas flows to the surface. 
Illustration: Sigve B0e Skattum. 



1.3 Goals 



We have chosen the goals of this work so that they will lead to a solid base understanding of 
how the physics of fluids works at the nanometer scale. On the way, we develop many of the 
relevant tools. These tools can be used in further studies in the field. 
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The following were the main goals of this thesis: 

a) Develop a three-dimensional parallel DSMC model 

The DSMC model has been used for the past 50 years to study flow of dilute gases in 
systems where the mean free path is of the same order as a characteristic size of the 
geometry. Having an implementation of this model will allow us to simulate flow in systems 
with channels so small that continuum mechanics do not longer produce correct physical 
behavior. The implementation needs to be parallelized for large-scale parallel machines. 

b) Develop methods to model arbitrary 3D geometries 

Important systems for theoretical purposes are for example cylinders or other simple, math- 
ematically well-described geometries. They can in many cases have analytical solutions 
and be excellent test cases for a more general numerical model. However, real fracture net- 
works are usually defined by a complicated geometry with no closed form mathematical 
description. We therefore need to find a way to represent arbitrary geometries allowing us 
to measure flow properties in more realistic systems. 

c) Develop a three-dimensional parallel MD model 

Different surface materials may interact very differently with the same fluid. Take for ex- 
ample water. Some surfaces are hydrophilic, which means that they attract water, whereas 
hydrophobic materials tend to repel water. This will of course affect how the fluid flows 
through a system. The DSMC model is a particle model that performs stochastic colli- 
sions with collision rules that use parameters depending on the combination of the specific 
surface material and fluid substance. With a good MD model, we can both study flow in 
small systems and compute the gas-surface parameters we need in DSMC. 

d) Develop custom 3D visualization tools for large particle data sets 

Both models produce time trajectories of the particles from a given initial state. The 
output data is a set of particle positions which can be visualized to learn more about the 
fluid dynamics. By building such visualization tools from scratch, we can overcome the 
drawbacks that are in the already available free software (these drawbacks are discussed 
later) and create custom features that satisfy our needs. 

e) Study flow and dynamics of water in simple nanoscale silicates 

With an advanced atomic potential in MD, we can study how water flows in nanoscale 
silicates [25]. Such a potential can for example be used to study how hydroxyl groups on 
the surface of the silicate affect the water flow. This can also be used to produce gas- 
surface parameters that enable us to study larger scale systems with the same statistical 
surface behavior as water in silicates. 

f) Investigate how slip velocity affects the permeability in nanoporous media 

It is well known from experiments that the measured permeabilities in nanoporous media 
can be two orders of magnitudes higher than the theoretical predicted values. Klinkenberg 
explained this effect by the fact that the fluid can have a non-zero velocity near the bound- 
aries [16]. Slip velocity leads to a correction to the theoretically predicted permeability. 
We will study flow in different nanoporous media to see how well these corrections predicts 
the permeability in the stochastic DSMC model as well as the MD model for systems with 
different pore sizes covering two orders of magnitudes. 
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1.4 My contribution 

In every thesis, as in any other scientific work, the foundation of the produced content is results 
from previous work. It should be clear what new ideas the author has contributed with. This 
could for example be new theoretical calculations, models, algorithms or tools that has been 
developed. Since a master thesis is a larger document containing more information than just the 
new contributions, it might be less obvious which parts that are the unique work of the author. 
Such a document deserves its own section highlighting these parts. 

Both models we have studied in this thesis have been programmed and implemented from scratch. 
A total of approximately 20000 lines of code have been written in C++ and Python. Writing 
everything from scratch provides a great insight in both models, especially from a numerical 
perspective, since every detail of the implementation has to be understood. In this section we 
briefly discuss the contributions by the author. This section is not meant to be an introduction 
to any of the concepts, so it is assumed that the reader is familiar with the models at the time 
of reading. If this is not the case, everything in this section should be clear after reading about 
both models and the visualization program in chapters 4, 5, 7, 8 and 10. 



1.4.1 Direct Simulation Monte Carlo 

In the DSMC model, we need to represent the geometry of the system (of which the fluid is 
confined in). This method has to be fast, scalable (for parallelizing) and general so that we can 
represent an arbitrary geometry. To be able to perform collisions between the surface and the 
particles, the surface needs to have well defined normal and tangent vectors in every available 
collision point. The author has developed a voxelation model with a new algorithm to compute 
the normal and tangent vectors based on the neighboring voxels. This gives a smoother effective 
surface than just voxel values. We discuss this model in detail in section 5.1. 



1.4.2 Molecular Dynamics 

The MD code is a standard, but remarkably efficient code using the Lennard- Jones potential. 
The code structure and parallelization technique is based on what is teached at the University of 
Southern California 1 . In section 8.4 we discuss that we want to simulate a fluid in an arbitrary 
geometry with the MD code. The author has developed a simple, but very promising model 
of a solid that allows a set of atoms to behave as a solid, vibrating about their equilibrium 
point while interacting with the fluid with realistic atomic forces. In addition, with an applied 
thermostat on these atoms, we are able to drain the system for energy which is a necessity when 
we induce flow in the system by adding a constant force as described in subsection 4.5.3. The 
details of this model is discussed in section 8.4. 



1 See http://cacs.usc.edu/education/cs596/ParallelMD-VG.pdf for details. 
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1.4.3 Visualization 

The free visualization software available, such as VMD and Ovito, are great tools for studying 
data sets from atomic or molecular models. There are two significant drawbacks the author 
has noticed; the camera control and performance. In both of these programs, the camera is 
looking towards a point, usually in the center of the system, whereas the mouse controls the 
rotation around this point. To be able to study different parts of the system in detail, one could 
want to control the camera as in a first person shooter game. The author has, together with 
Svenn-Arne Dragly, developed visualization tools allowing us to visualize up to 100 million atoms 
simultaneously with decent frame rate with the camera control described above. In chapter 10 
we discuss how to make full use of geometry shaders, which allow us to render the spheres 
representing the atoms on the Graphical Processing Unit (GPU). 

1.5 The structure of this thesis 

This document is arranged in five parts. Chapter 2 opens with a brief discussion about the 
theory of fluid mechanics. We discuss how the continuum approach in standard hydrodynamics 
breaks down for dilute gases in nanoporous media, and what the current theory has to offer in 
predictions of the permeability. The largest subject of this study is the DSMC model in part 
I. It begins with an introduction to kinetic theory in chapter 3 which we use in 4 when we 
introduce the DSMC model. The implementation of the model is explained in chapter 5 with 
the numerical results presented in 6. 

In part II, we discuss MD, the second model we have studied. We begin by introducing the 
theory behind MD in chapter 7 with the implementation in chapter 8. The results are presented 
in chapter 9. The final part is concerned with our desire of creating a custom 3D visualization 
tool that we can use to visualize the simulations we perform with both models. We start by a 
brief introduction to OpenGL in chapter 10 where we explain graphics programming and how 
the graphics card can be used to draw geometrical models on a computer screen. In chapter 11 
we explain how to use advanced shaders in the rendering pipeline to render the time evolution 
of tens of millions of atoms on the screen. We also discuss the marching cubes algorithm that is 
used to render the surface defining the geometry in a DSMC simulation. 
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Most people have an intuition about what a substance is. A substance is just something, like 
water, wood, butter or glass. A metal is a substance. The blood in your body is a substance. All 
of these are made up of atoms that form molecules in ways giving them very different properties. 
We know that a substance can exist in different phases, for example plasma, solid, liquid and 
gas [23], in which the same substance can behave very differently. Liquids and gases are often 
called fluids because they share the property that they do not have a fixed shape and are often 
easily deformed. The theory that describes how fluids behave is called fluid mechanics. 

We start this chapter with section 2.1 where we discuss the concept of continuum. This leads 
to the Euler equations and the Navier-Stokes equations in section 2.2. These equations can be 
used to study the behavior of fluids in motion relative to the material confining the fluid - fluid 
flow. We then introduce the concept of porous media, a solid material with parts of its volume - 
the pore space - available for fluids. Fluids can flow through such a material, and the equation 
describing the flow rate as a result of some pressure gradient is called Darcy's law. One of the 
parameters in Darcy's law is called permeability. This quantity is discussed in section 2.5. 

Pore space with channels at the nanometer scale introduces a distinction between what we call 
macroflows and microflows. This leads to what is called the breakdown of continuum, which is 
addressed in section 2.7. The Knudsen number is discussed in section 2.8. It is used to quantify 
whether or not continuum models can be used, in addition of measuring the importance of slip 
velocity which has major consequences for the permeability. This is discussed in section 2.9. 
We then discuss particle models - which are not based on the assumption of continuum - in 
section 2.10. The last two sections are concerned with corrections of the permeability. These are 
called Klinkenberg correction and Knudsen correction, and use different models for slip velocity 
to predict the increase in permeability. 
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2.1 The continuum 



In reality, we know that a fluid is composed of an enormous number of atoms which are separated 
by mostly empty space. The atoms interact through quantum mechanics that allows molecules 
to form. Even though this is the true nature (well, every physicist should be careful to say 
that something really is the true nature) of the fluid, it turns out that we can describe it as a 
continuum, that is, we assume that the mass is continuously distributed in the total volume of 
the fluid. This also means that macroscopic quantities like temperature, density, energy and the 
fluid velocity are well defined in every point in space. This is great, continuum mathematics 
is so much easier to work with than discrete mathematics. Calculus tells us that well behaved 
functions (as we prefer to work with) can both be integrated and differentiated. They also treat 
us rather nice, they do what we expect them to do. 

In physics we strongly believe in conserved quantities. A fluid in motion will internally (far away 
from the boundaries in the container) have conserved energy, mass and momentum. This can 
be formulated beautifully with mathematics, and gives rise to what we today know as the Euler 
equations and the Navier-Stokes equations. 



2.2 The Euler equations and the Navier-Stokes equations 



With the concept of continuum, we think that every point in space has well defined physical 
properties like temperature, density and momentum. In 1757, Euler published what's called the 
Euler equations by applying conservation of mass, momentum and energy to a small volume 
element dV of a fluid. They form a set of differential equations describing how the fluid velocity 
u(r, t) field changes in time and space. The conservation laws can be written as 

+ V • (p m u) = 0 mass (2.1) 

^ + V-(u®(p m u))+VP = 0 momentum (2.2) 
0E 

— + V • (u(E + P)) = 0 energy, (2.3) 



where p m is the mass density, u(r, t) is the velocity field, -E^r, t) is the total energy per unit 
volume, P(r, t) is the pressure field and <g> is the tensor product. These can be combined to one 
vector equation, but its origin, the connection to conservation laws, is more clear when they are 
separated like this. In the original paper, Euler only derived the first two equations, but the 
full set is usually referred to as the Euler equations. They describe the motion of fluids with 
negligible viscosity, which is a decent approximation for many fluids. 

Some 80 years later, in the 1840s, Sir George Stokes published the Navier-Stokes equations 
(NSE) which can be seen as an extension of the Euler equations where effects caused by the 
viscosity of the fluid are included [4]. The Navier-Stokes equations for an incompressible fluid 
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can be written as one vector equation 

p m — =/9 m F- Vp + AiV 2 u + /xV(V-u) (2.4) 

where F is an external force (i.e. gravity or an electrostatic field), fi is the viscosity and D/Dt 
is the material derivative defined as 

^ = l + uV - < 25) 

The NSE have quite a few interesting analytically solvable solutions, but for most real systems, 
the geometry confining the fluid is so complicated that it is usually solved on computers. When 
solving the NSE, we have to provide boundary conditions to get a unique solution for the system. 
If we at one end of the container apply some pressure Pq, and some other value Pi in the other 
end, we get a flowing fluid since there acts a net force on the fluid. This defines the pressure 
difference AP. We then also specify the velocity at the boundary which often is the no-slip 
boundary condition, i.e. that the fluid velocity is zero at the container walls. It turns out that 
for real fluids, this is not always true. In section 2.9 we discuss the effects of slip velocity and 
how this affects the flow properties of the fluid. 



2.3 Flow in porous media 



Fluid flow can be defined as fluid in motion relative to its container - the material confining the 
fluid. A material with parts of its volume available for fluids is called a porous medium (see 
figure 2.1). 

We call the larger regions available for fluids pores whereas channels connecting these pores are 
called the pore network. All available such space is called the pore space. If the porous medium 
has a total volume V, and the pore space takes up a volume V p , we define the porosity (ft as 

Pore space volume V p 
Total volume V 

When a fluid flows through the material, the amount that flows through a surface per unit time 
is called the volumetric flow rate and is usually denoted by Q. This quantity measures how 
many cubic meters of fluid we can push through a surface orthogonal to the flow direction per 
unit time. If we increase the pressure difference, we expect a higher flow rate. This is indeed 
true. In fact, the volumetric flow rate is proportional to the pressure difference. 



(2.6) 



2.4 Darcy's law 

When we apply a pressure difference on each side of a material filled with a fluid, the fluid will 
start to flow in the direction of lower pressure. In 1856, H. Darcy found a linear relation between 
the pressure difference and the fluid flow rate. This relation is called Darcy's law and tells us 
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Figure 2.1: An example of a porous medium. Here we see a metal foam - a solid metal with 
pore space available for a fluid. A nanoporous medium is a porous medium where the size of the 
pores are at the nanometer scale. Image from http:/ /en.wikipedia.org/wiki/File:Metal_Foam_ 
in_Scanning_Electron_Microscope, _magnification_10x.GIF, accessed 28 March, 2014. 



what volumetric flow rate Q we can expect from an incompressible fluid through a material of 
length L, when we apply a pressure difference AP, see figure 2.2. The one dimensional version 
of Darcy's equation is given as 



u 



9. 

A 



a D - 



AP 

T 1 



(2.7) 



where u is the volumetric flux (volumetric flow rate per unit area) , AP = Pq — Pl is the pressure 
difference, A is the cross sectional area; the area of the material orthogonal on the flow direction 
and L is the length of the material in the flow direction. o~d is the proportionality constant that 
can be written as 



o-d 



k 



(2- 



where [i is the viscosity and k is the permeability. 
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Figure 2.2: A box with volume V = LA with fixed pressure values at x = 0 and x = L. The 
volumetric flow rate Q through a cross sectional area A is given by Darcy's law in equation (2.7) 

2.5 Permeability 

The motivation of introducing the concept permeability is to separate the proportionality con- 
stant into two parts; one that depends on the liquid only, the viscosity /i, and the permeability, 
a material specific constant k. This means that we in principle can do an experiment with a 
liquid with known viscosity, say water, and measure the permeability of some material (Darcy 
studied a sand filled cylinder in his original experiment). Once you know the permeability, you 
are able to predict the flow rate through the material for any other liquid with a well known 
viscosity. This is of great importance for i.e. the oil industry where they ideally would like to 
take a sample of the rock in which the oil or gas is confined, measure the permeability with e.g. 
air, and then use this to predict the recovery rate. 

This is of course not completely true in all circumstances. While Darcy originally found the 
relation as an empiric equation based on experiments, it can be derived from the Navier-Stokes 
equations. Darcy's law is only correct if the fluid flow satisfies the no-slip condition. 



2.6 Macroflows and microflows 

In the 1990s H. Bau and J. Zemel performed experiments on microchannel flow in which they 
found clear deviations from what was expected from the theory [15]. It is useful to introduce 
the terms microflows for flow in geometries where the distance between the channel walls is of 
order micrometer or smaller, and macroflows for larger systems (millimeter and above). Flow 
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at microscales differ from macroscales because of effects that can be classified into four groups 

• non-continuum effects, 

• surface-dominated effects, 

• low Reynolds number effects, and 

• multiscale and multiphysics effects. 

In this thesis, we focus on the non-continuum effects which are briefly discussed in section 2.7, 
and the surface-dominated effects as the slip condition described in section 2.9. See [15] for 
details about the effects of low Reynolds number, multiscale and multiphysics. 

2.7 The breakdown of continuum 

As we discussed in section 2.1, a fundamental assumption in the NSE is that the space is 
continuous, but we know that in reality, the mass of the fluid is concentrated in the center of the 
atoms. We often assume that the mass is uniformly distributed in the volume element of which 
the conservations laws are applied on. This is known as the continuum hypothesis and is invalid 
when the mean free path A, the average distance a particle moves between collisions, becomes 
comparable to some characteristic length L in the system, i.e. the diameter of a channel[15]. 
This is quantified through the Knudsen number 



From the kinetic theory we can calculate the mean free path (this is done in section 3.7) 



where p n is the number density and m and d are the mass and diameter of the particles. By 
using the ideal gas law P = p n ksT, we can replace the density with the pressure P and the 
temperature T. Here ks is Boltzmann's constant. Molecular Dynamics simulations have shown 
large fluctuations in temperature and density near the wall in layers a few mean free paths 
from the surface. Such effects are not reproduced in models based on continuum equations 
[15]. Another very important effect, as we will discuss in the next subsection, is the non-zero slip 
velocity of the gas. However, for Knudsen numbers smaller than 10 -2 , the continuum hypothesis 
is valid and we can use continuum equations like the NSE and the Euler equations. It turns out 
that the Knudsen number is very useful. 




(2.9) 
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2.8 Knudsen number 



The Knudsen number is the ratio between the mean free path - the average distance a particle 
moves between colliding with another particle - and a characteristic length in the system. This 
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length could be the radius of a cylinder or the radius of a spherical pore or some other length 
scale that is representable for the system. In other words, it can be related to how often a 
particle collides with the surface compared to other particles. A Knudsen number larger than 
one indicates that a particle travels longer before it collides with another particle than the 
distance to the surface which means that surface effects are of increasingly importance as the 
Knudsen number increases. 

To get an idea of the length scales where the Knudsen number is about unity, we first need to 
calculate the mean free path. An oxygen atom in air at room temperature has a mean free path 
[10] 

A Air = 8.0 x 1(T 8 m, (2.11) 
whereas the mean free path in water is 

Awater = XlO- 11 m. (2.12) 

This means that air in a nanoporous media with typical pore size about 80 nm, the Knudsen 
number is of order unity and the continuum hypothesis is invalid. Water in the same system 
has a Knudsen number approximately 1 x 10~ 4 and continuum models should in principle hold. 



2.9 Slip velocity 

The usual boundary condition we apply when solving the NSE is the no-slip condition where 

u(r;t) = 0 reSfi, (2.13) 

where <9f2 defines the boundary domain. The history of the no-slip condition was studied by 
Day [9], based on the work of Stokes in the 19th century. Stokes compared theoretical results to 
experiments for pendulums of different kinds and concluded that 

I shall assume, therefore, as the conditions to be satisfied at the boundaries of the 
fluid, that the velocity of a fluid particle shall be the same, both in magnitude and 
direction, as that of the solid particle with which it is in contact. The agreement 
of the results thus obtained with observation will presently appear to be highly 
satisfactory. (Stokes, 1901) 

In Day's detailed study of the no-slip condition, he says 

Looking back, it appears that the acceptance of a more general no-slip condition 
was prolonged because of experimental shortcomings, not because of a lack of the 
appropriatetheoretical solutions to fluid flow problems. (Day, 1990) 

In other words, the theoretical framework that existed already in the time of Stokes were com- 
plete enough to include both slip and no-slip solutions. In fact, Maxwell predicted slip velocity 
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in a paper already in 1867[17], but the experiments the next 50 years seemed to more or less 
confirm the no-slip condition. 

That a fluid has a slip velocity is rather obvious when reading Klinkenberg's nice argument 

Consider a layer adjacent to the wall which is thinner than the mean free path 
A of the gas molecules, so that practically a molecule does not collide with other 
molecules present in this layer. At a given moment half of the gas molecules in this 
layer will have a component of velocity moving towards the wall; the other half in 
the opposite direction. The molecules moving towards the wall have had their last 
collision somewhere in the flowing mass, and, therefore, will have an average velocity 
component in the direction of flow different from zero. A part of this average velocity 
component will be lost in colliding with the wall. Even if the molecules lose it entirely, 
then still the average velocity component in the direction of flow of all the molecules 
contained in the layer will amount to half of the average velocity component of the 
molecules moving towards the wall. The gas in the layer, therefore, will have a finite 
rate of flow. (L.J. Klinkenberg, 1941) 

It is convenient to introduce the concept of slip length l s to be able to quantify the slip velocity. 
Slip length is defined as the distance into the wall we would have to extrapolate a velocity profile 
for it to reach zero value, see figure 2.3. Maxwell theory predicts the following relation between 



v(r) v(r) 




I — I Slip length 



Figure 2.3: Slip length is the distance into the wall we would have to extrapolate a velocity 
profile for it to reach zero value. We have the no-slip condition on the left, where the slip length 
is zero whereas we have a non-zero slip length on the right. 

the slip length and the mean free path 

l s = a\, (2.14) 

where a ~ 1.15 is the slip coefficient [20]. The effects of slip velocity become more apparent 
when the channel diameter is of the same order as the mean free path. By introducing the 
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dimensionless slip length 



L 



= a— = aKn, 



(2.15) 



we see that the ratio of the slip length to the channel diameter is proportional to the Knudsen 
number. The actual slip velocity (the average velocity of the molecules right next to the wall) 
can be written as 



where n is the direction normal on the wall[16]. We call this a first order slip model since it is 
contains only the first derivative of the velocity. Higher order models exists and give corrections 
to the permeability that are important in nanoporous media, where the channels that contribute 
to flow are of nanometer scale. This is discussed in section 2.13. 



2.10 Particle models 

For systems where the continuum hypothesis is invalid, we need other models describing the 
behavior of the particles in our system. The first idea that might pop our minds might be to 
study the system at the atomic level. The equations of motion and hence the dynamics of a 
system can in principle be calculated directly from quantum mechanics by solving Schrodinger's 
equation. Since this requires calculating the wave function of every atom with complex atomic 
interactions, the size of the system needs to be very small with today's computers. 

An alternative, popular approach is to use a parameterized potential U(r N ) (r N being the po- 
sitions of all atoms), and calculate the forces through the gradient of U. Newton's equations of 
motion is then integrated and the dynamics of the system are determined in a classical, determin- 
istic way where important effects from quantum mechanics are embedded in the potential. This 
method is called Molecular Dynamics and is studied in chapter 7. Molecular Dynamics is orders 
of magnitudes faster than models solving Schrodinger's equation, but it still needs a detailed 
description of the dynamics of every atom in the system. For many problems, this information 
is redundant because what's really important is the statistical properties of the system. 

In statistical mechanics, we don't need the full information about every single atom. We can 
then develop models using statistical mechanics and save a lot of computation power compared 
to Molecular Dynamics. One such model is called Direct Simulation Monte Carlo and is studied 
in chapter 4. The fundamental equation in this case is the Boltzmann equation, which we 
derive in section 3.4. In the limit of low Knudsen numbers, these models do of course converge 
towards the continuum models. It is convenient to classify different flow regimes depending on 
the Knudsen number. 



dv 
dn 



(2.16) 
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2.11 Flow regimes 

We can divide the entire Knudsen range into different regimes enabling us to get an overview 
of which equations and models that are valid for which Knudsen numbers. In the low Knudsen 
number limit, the continuum hypothesis is valid and we can in principle use any of the models we 
have discussed. The continuum approach is here of course preferable since it more computation- 
ally efficient compared to the particle models. At some point (Kn> 0.01), the no-slip boundary 
condition is invalid and we need to incorporate this into the models solving the NSE in order to 
get accurate solutions. Here starts the slip regime. When the Knudsen number approaches 0.1, 
we start to see transitional flow where the flow is laminar near the edges and turbulent in the 
middle of the material, before we at Knudsen numbers larger than 10 have free molecular flow 
where the particles almost never interact with each other. This is illustrated in figure 2.4 where 
we have included the regions where different equations are valid. 
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Figure 2.4: The four flow regimes covering the important regions in the Knudsen number range 
where different flow types appear. In the low Knudsen number limit, the fluid can be assumed 
to be a continuum and we can use equations like the Euler equation or the NSE. For larger 
Knudsen numbers, the no-slip boundary condition is invalid and we need a model satisfying slip 
velocity. We reach the transition flow regime at Kn~ 0.1 where the continuum models do no 
longer hold, even with slip boundary conditions. In the high Knudsen regime particle collisions 
are so rare that it is classified as free molecular flow. In this range, the collisionless Boltzmann 
equation is valid (see section 3.4). 



2.12 The Klinkenberg effect 

In 1941, L.J. Klinkenberg published a paper explaining the discrepancies that was found in 
experiments when measuring the permeability for both liquids and gases in the same material[16]. 
His discoveries were of great importance in the oil industry 
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It has become common practice in the oil industry to determine the permeability of 
core material with dry air; the equipment usually employed for this determination is 
arranged to operate with the outlet of the sample at or near atmospheric pressure. 

(Klinkenberg, 1941) 



He introduced a linear scaling function / c (Kn) that relates what he called the apparent per- 
meability k a - the permeability measured in the lab for a fluid with arbitrary density - to the 
absolute permeability koo, the permeability for a liquid in the high density limit. The absolute 
permeability is a constant dependent only on the porous medium. The relation is given as 

k a = fckoo ={ 1 + 4c 4) k °° = ( X + 4aKn ) k °°> ( 2 - 17 ) 

where L is the diameter of the channel and a ~ 1.15 is the no-slip factor in equation (2.14). 
Since the mean free path is proportional to the inverse pressure, we can write the scaling function 
as 

/ C =(l + ^), (2-18) 

where b is a constant depending on the material. In figure 2.5, we see that the Klinkenberg 
correction factor predicts a permeability that almost fifty times higher for a gas in a material 
with Kn = 10. The figure also contains the Knudsen correction factor which is discussed in the 
next section. The Klinkenberg correction factor is derived using the first order slip velocity in 
equation (2.16) which is not sufficient for high Knudsen numbers as we will see in section 6.3. 
Based on higher order slip velocity models, one can derive better permeability corrections. 



2.13 Knudsen's correction 



Beskok and Karniadakis (1999) developed another correction factor that uses a second order slip 
velocity model 



where a(Kn) is given as [7] 



f e = [1 + a(Kn)Kn] 



a(Kn) 



1 + 



Q'O 



4Kn 
1 + Kn 



1 + 



(2.19) 



(2.20) 



Kn^ 



where A = 0.170, B = 0.4348, and a® = 1-358. These are fitted parameters based on a large 
dataset of Loyalka and Hamoodi (1990). We see in figure 2.5 that the Knudsen factor predicts 
approximately 40% higher correction than the Klinkenberg factor. In chapter 6 we will check 
the validity of this correction factor for a simple cylinder, and discuss the practical problems 
we meet when studying flow in complex geometries where the system does not simply have one 
Knudsen number, but rather a distribution of Knudsen numbers. 
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Figure 2.5: A comparison between the Klinkenberg factor and the Knudsen factor shows that 
slip velocity leads to even higher corrections to the permeability than what the Klinkenberg 
factor predicts. We see that in the high Knudsen number limit (Kn = 10), we can expect an 
increase of permeability by a factor 50 compared to a liquid or a high density gas. The line 
labeled Relative shows the ratio between the Knudsen correction and the Klinkenberg correction 
factors. 
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Before we describe the first model, the Direct Simulation Monte Carlo, we need to discuss some 
theory. The model is based on the kinetic theory of gases, which is a microscopic theory that 
describes the behavior of gases at the molecular level. A system of N particles is fully described 
by the 3iV momentum components combined with the 3N spatial coordinates. Together, this 
forms a 6N dimensional phase space where each point represents the state of the system in 
an ensemble. We start this chapter by introducing the distribution function, the concepts of 
microstates, macrostates and ensembles in section 3.1, before we in section 3.2 explain how 
we measure the macroscopic observables which are average values over all the states in an 
ensemble. Then we have a brief discussion about ergodicity in section 3.3 which is a very 
important assumption when we start measuring physical quantities in a numerical statistical 
mechanics model such as Direct Simulation Monte Carlo (chapter 4) and Molecular Dynamics 
(chapter 7). In section 3.4 we derive the Boltzmann equation which is the fundamental equation 
that governs the behavior of the distribution function. We then define what an equilibrium 
state is which we use to derive the Maxwell-Boltzmann velocity distribution in section 3.6. As 
we might remember, the Knudsen number is an important dimensionless quantity that we use 
to quantify how important surface and non-continuum effects are. The Knudsen number is the 
ratio between the mean free path A and some characteristic length L of the system. In section 
3.7 we calculate the mean free path which is used to compute the mean collision time t co h, which 
is important to choose a good timestep in the Direct Simulation Monte Carlo model in chapter 
4. 



3.1 The distribution function 



A point (r, v) in the phase space describes what is called a microstate and contains a massive 
amount of information. Given this point, we would know the position and velocity to every 
particle in the entire system. In a liter of an ideal gas under standard pressure, the number 
of particles is of order 10 22 [12], so if each of these 6N coordinates were represented as an 8 
byte double on a computer, we would need more than 10 11 terabytes of memory just to store 
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all the information. This approach would be very inconvenient and, fortunately, not at all 
necessary. The really interesting properties in a system are the macroscopic ones, like energy, 
temperature, pressure, volume, average velocity among others. For example, the total energy in 
a gas consisting of N particles is calculated as 

N 1 

E = J2- mi vf + V(r), 
i=i 

where rrii is the mass of particle i, Vi is its scalar velocity and V(r) is the total potential energy 
in the system depending on the full 3iV-dimensional spatial coordinate r. 

Given a microstate, what happens if we switch two particles, say particle i and j? If particle 
i had velocity Vj = u and particle j had some other velocity Vj = w, we could quickly swap 
them so that Vj = w and Vj = u (theoretically of course, it would be a difficult task in an 
experiment). If their masses are identical, the total energy of the system would not change, 
but since we know that we switched the two particles, we could want to count this as another 
microstate. We could in principle paint the particles with different colors, or maybe just label 
them with their own unique number. However, in a real, mono-atomic gas, we can't really tell 
the difference if particle i and j secretly agreed to switch places without telling us. If they did 
so, it would not count as different microstates, the system remains exactly the same. We say 
that the particles are indistinguishable. 

If we instead increase the velocity of particle i, we can reduce some another particle fs velocity 
to keep the total energy constant. Even though the macroscopic property energy is unchanged, 
there is a (theoretically) measurable difference between these two states. The set of all mi- 
crostates that share the same macroscopic state variables (a macrostate) forms an ensemble of 
systems. A much used ensemble is the microcanonical ensemble (NVE) with a constant number 
of particles N, constant volume V and constant energy E. Increasing the velocity of particle 
i while at the same time reducing particle j's velocity just enough to remain the energy un- 
changed does not change the particle number N, the volume V or the energy. So these two 
different microstates would both be in the same ensemble. 

In a typical system, the number of microstates in a macrostate is so huge that the phase space 
points can be described by a continuous density function /(p, r, t) without losing any important 
information [19]. The input parameters are the 3iV momentum components, the 3iV spatial 
coordinates plus time. This function is often called a distribution function, normalized so that 

|//(p,r,i)dpdr = jV, (3.1) 

where dpdr = dpidp2---dr37v is the 6N dimensional phase space volume element. This density 
function does not contain the information about the exact positions or momenta of the particles, 
but the probability to find the system in a state around a given phase space point. We can then 
use it to calculate measurable, macroscopic average values. 



Section 3 



Ensemble averages 35 



3.2 Ensemble averages 

Given the distribution function /, we can calculate any ensemble average (which will be the 
measurable, macroscopic properties of the system) by interpreting / as a probability distribution 
(it needs the factor 1 /N to be normalized to one) that gives the probability of finding a particle 
at position r + dr with momentum in the range p + dp at the time t. We can then use the 
standard expectation value expression to calculate a macroscopic property A 

A{t) = JfJJ A (P' r > *)/(P> r ' *) d P dr " ( 3 - 2 ) 
This could for example be the total energy 

E(t) = ±JjE(p,T,t)f(p,i, t)dpdr (3.3) 

-i//(K(r> + g|Q/<*M)dpdr > (3.4) 

where pj is the momentum of particle i. Any other quantity of interest can in principle be 
measured in the same way. 

3.3 Ergodicity 

The ensemble average calculates the average value of some macroscopic quantity given the 
distribution function /. Usually, we don't have the distribution function, except in some very 
simple theoretical calculations. Even then, it might be difficult to compute the integral in 
equation (3.2). The usual situation when we do numerical statistical mechanics is that we have 
a way to explore the phase space, hoping that it helps us visit states with probabilities according 
to the given ensemble. Many Monte Carlo techniques (such as the Metropolis algorithm) allow 
us to go to a new, random point in the phase space, and calculate the probability of going from 
the current state to the new state. From this, we can count the number of times we have visited 
different regions of the phase space and create a histogram or maybe, if we're lucky, fit some 
existing probability distribution to our data. 

Another approach is to let the rules of physics take us around in the phase space, from one state 
to another, following Newton's equations of motion. Imagine that at t = 0, our system is in 
some microscopic state (a single phase space point (r(0),p(0))) and at a later time t = t has 
moved to (r(r),p(r)). Between these two points, the system has moved through many other 
points, exploring the phase space. It seems reasonable that in the limit of infinite time, the 
time evolution should visit the phase space points according to the density given by /. If not, 
it wouldn't make any sense even talking about /, since it is the time evolution we, as humans, 
are experiencing in the real physical world. 

This is called the ergodicity hypothesis, the assumption that a system following the laws of 
physics, explores the phase space with the probability of being in a region proportional to the 
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density in that region. The average value of a macroscopic quantity A is then found as 

A = Hm - f A(r(t'),p(t'),t')dt'. (3.5) 

t^oo t J 0 

3.4 The Boltzmann equation 

In the section 3.1, we introduced the distribution function / that describes the density in a 
6iV dimensional phase space. Now, if we know the distribution function at t = 0, we could in 
principle compute any property of the system. But at some later time the distribution function 
might have changed, unless, of course, d//dt = 0 in which the system would be in an equilibrium 
state. We can, by applying conservation of probability, derive an equation of motion for /. This 
equation is called the Boltzmann equation and describes how / evolves through time by assuming 
that any change of probability around a point (r, v) at the time t must be due to 

• flow through a surface in the phase space, 

• an external force, or 

• internal collisions. 

All three will change / in different ways. For simplicity, we will first assume that the particles 
do not collide and derive the collisionless Boltzmann equation. But do not worry, we will add 
the collision term later and end up with the full Boltzmann equation. 



3.4.1 The collisionless Boltzmann equation 

Consider the density /(r, v,t)drdv around the phase space point (r,v) at the time t. Then, at 
a time dt later, if we assume no forces and that the total number of particles has not changed, 
that chunk of density has moved to (r + vdt,v). Conservation of probability states that any 
change of / within a volume f2 must flow through the boundary dQ, 

£ // / drdv = - / dv / /(v ■ n r ) dS r (3.6) 

= -! dv I V r • (/v) dr = - jj V r • (/v) drdv (3.7) 

which becomes 

% + V r -(/v) = |+vV r / = 0 (3.8) 

since v is independent of r. We can extend this equation by adding the effects of an external 
force F that changes the velocity in the same way as the position was changed above (except 
for the factor 1/m) 

df F 

^T+v-V r / + V v --/ = 0, (3.9) 
at m 
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which we call the collisionless Boltzmann equation. It is a good approximation to describe the 
dynamics of very dilute gases where intermolecular collisions occur rarely. But we should not 
ignore collisions between particles, so as promised, we will now see that by treating collisions 
will appear as an additional term. 

3.4.2 The collision operator 

We consider a dilute gas so we can assume that only binary collisions occur (we ignore the 
contribution from collisions between three or more particles at a time). We also assume that the 
total energy, momentum and mass is conserved in all collisions. Then consider two particles i 
and j moving towards each other with velocities Vj and Vj , and relative velocity v re i = Vj — Vj . 
We define particle i as the incident particle and j as the target particle. After the collision, the 
particles will have velocities and v'- with relative velocity v rel = — V- . 

In order to make the calculations simpler, we change the frame of reference, in which we label 
the velocities with a tilde so that v — > v. If we choose the target particle as initial frame of 
reference, we see that the velocity of the incident particle b ecomes — v re j and — v re j. Since 
momentum is conserved, we know that the relative velocity must remain constant, |v re i| = |v rel |, 
during the collision. The direction of v rel is given by the angles 4> and 6 with z along v re i and 
4> £ [0, 2tt], 9 € [0, 7r]. We can express v rel as 



Vrei = v rel ~ 2e(e • V re l) 



(3.10) 



where e is an arbitrary unit vector. If we multiply by e, we see that 



e • v rei = e • [v re i - 2e(e • v rei )] = -e • v rei 



(3.11) 



which gives the symmetric relation 



Vrei = v rel -2e(e-v rel ). 



(3.12) 



The angle between v re i and v rel is 6, so 



v rel • v rei = v 2 ml cos 6 = w r 2 el (l - 2 cos x), 



(3.13) 



where x is the angle between e and v re i, which gives the relation 



9 = 7T - 2 X . 



(3.14) 



We now define the solid angle element dfi = sin #d#d(/> about v' r 



rel 



v re \dQ = v re i s'm8d9d(j) = 2v re \ sin(-7r — 2x)dxd(p 

= 4v ie \ cos x sm X^xd^ = 4 | e • v re i | sin xdxd<f> 
= 4|e-v re i|d 2 e, 



(3.15) 
(3.16) 
(3.17) 



where d 2 e = sinxdxd^ is a solid angle element about e. In the following, we will calculate the 
scattering cross section which is the area that describes the likelihood of an incident particle 
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being scattered by the target particle. We denote the number density p n and find that the 
incident flux is p n v X6 \. The rate h' d Q of scattered particles into df] is 

h' dn = PnVrelCrdCl, (3.18) 

where the proportionality constant a is the cross sectional area. We might have several target 
particles colliding independently of each other which will contribute to the scattering rate. If 
we have m such particles, we obtain the total scattering rate hdn by multiplying (3.18) by nt 

hdn = n t p n v re[ o<19>. (3.19) 

The differential cross section a depends on v re i and v£ el , so we denote it as a = cx(v re i — > v^. el ), 
whereas the total cross section a? is given by integrating over all solid angles 



a T = Jo-dVL. (3.20) 

We will now look at particles with velocities in the range [v, v + dv] incident on target particles 
with velocities in the range [vi, vi + dvi]. The incident flux is gf(v, r)dv and the number of 
target particles is f(v i, r)dvidr. The rate at which particles with velocity vi are scattered by 
particles with velocity v is 

_f(v)/(vi)i; re i<7 dQdvdvidr = /(v)/(vi)4 |e • v re i| a d 2 edvdvidr. (3.21) 

The rate of loss is the rate of which particles in dvdr are being hit by other particles. We can 
calculate this by integrating over all solid angles d 2 e and incident velocities dv 



rate of loss 



f(r, v, t)f(r, vi, i)4 |e • v re i| a d 2 edvi 



dvdr. (3.22) 



We also have the inverse event, incident particles with velocity v' hitting target particles with 
velocity v^ so that the final velocity of the target particles is v. This is calculated with the same 
idea 



rate of gain 



/(r, v', t)f(r, v[, t)A |e • v re i| a d 2 edvi 



dvdr, (3.23) 



since |e-v re i| = |e-vL J and dv'dv^ = dvdvi. The total change in the distribution function is 
given by the functional J[f] 

J [/] = /// 4 I 6 • v re lk[/7( " fh) dvid 2 e (3.24) 

"v iel a[f'f[ - //a] d Vl dO, (3.25) 



where / = /(r, v,t) and f\ = /(r,Vi,f). The full Boltzmann equation is then given by 

|f +v-V r / + --V v / = J[f}. (3.26) 
ot m 

In the derivation of the collision operator J[f], we assumed binary collisions only. This is a decent 
approximation that holds for low densities. By defining the force range D and the dimensionless 
parameter v = p n D 3 , the Boltzmann equation is valid when v is small [18]. D 3 defines a volume 
around a particle in which the forces cannot be neglected. This makes v the average number 
of particles within that volume. If that number is small (less than unity) then we can safely 
neglect collisions between three or more particles. 
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3.5 .fT-theorem 



Now we have an integro-differential equation describing how the distribution function evolves 
through time. We will now look at Boltzmann's .ff-theorem which is a powerful result that gives 
us all the theoretical tools we need to implement the DSMC method. We will use this to derive 
what velocity distribution a gas in equilibrium obeys. In addition, we will find the mean free 
path and the mean collision time which both will be used in the Direct Simulation Monte Carlo 
method. Let us define the .ff-function as 

H(t) = (In /) = jj /(r, v, t) In /(r, v, t) drdv, (3.27) 
and differenciate with respect to time 

^ = //|(ln/)d rdv + jj% drdv = //f In/drdv, (3,28) 
where the last term vanished since the number of particles is conserved 

We multiply the Boltzmann equation by In / and integrate over r and v 

JJ^f% drdv = " //( ln /) v • V r/ drdv - yy"(ln /) ^ ■ V v / drdv 

+ yy*ln/J[/]drdv. (3.30) 
The first integral on the right can be integrated by parts 

J f (ln /) v • V r / drdv = jj /(In /)(v • n) dr r dv 

7m/(V r -v)drdv = 0, (3.31) 



II' 



where dr r indicates the boundary of the spatial domain. The integral is zero if we assume that 
/ is zero at the boundaries and v is independent of r. Similarly with the second integral on the 
right in equation (3.30) 

^ J J (hi /)F • V v / drdv =l||/ln/(F.n) dr v dr 

-^JJfhx /(V v • F) drdv = 0, (3.32) 

where dr v indicates the boundary of the velocity space. This integral is also zero since / is zero 
when v — > ±oo and the force is independent of the velocity. By recognizing that the left hand 
side of equation (3.30) actually is dH/dt, we end up with 



dH 

~dt 




In f[f'f[ - fh] 9 a dftdvxdv, (3.33) 
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which can be written as[19] 



dH _ 1 
~dt ~ 4 



hi 



fh_ 
f'fi 



[/7I-//i]^dOd Vl dv. 



(3.34) 



The integrand is of the form — (x — y) ln(x/y) which is negative for x ^ y and zero for x = y. 
This means that 



dH 

~dt 



<0, 



(3.35) 



which is called the If-theorem. Since H{t) is bounded (see equation (3.27) and remember that 
/ indeed is bounded), in the limit t — > oo, H(t) reaches an equilibrium state as dH/dt = 0. This 
in turn means that the integrand in (3.34) must be zero as well which happens if 



f'fi — fhi 



(3.36) 



which allows us calculate the equilibrium velocity distribution. 



3.6 The Maxwell-Boltzmann distribution 



We will now find out which / that satisfies equation (3.36) which we rewrite by taking the 
logarithm on both sides 



ln/' + ln/( = ln/ + ln/ 1 



(3.37) 



This states that the sum of of the logarithm of / is unchanged during a collision. As we required 
in subsection 3.4.2, energy, momentum and mass are conserved quantities, so In / must be a 
linear combination of these 



In / = am + j3 ■ (mv) — 7 



mv 



am + 



m (3 ■ (3 m.7 
~2~ 2~ 



(3.38) 
(3.39) 



for some real values a, f3 and 7. We combine all quantities not dependent of v into one parameter 
In c so that 



which we can write as 



In / = In c 



/(r, v) = cexp 



7717 



6 



rrvy ( /?Y 



By using that 



p n (r,t) = Jfdv, 



(3.40) 



(3.41) 



(3.42) 
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and 



P 

in addition to the equipartition theorem 



vo = (v) = - /v/dv, (3.43) 



h B T = ^((v- Vo f), (3.44) 

we can write equation (3.41) as [18] 

/ m \3/2 _ m , v |2 

^-"{irtif) eV - (3 - 45) 

which is the Maxwell-Boltzmann distribution. We can integrate out the position part of the 
distribution to get 



m \ 



3/2 



P(v)dv = \2^f ) 6 ^ dV - (3 ' 46) 

We will use this to validate the DSMC code section 6.1. Now, let us use this to find some 
interesting properties of the gas. Given a temperature T and the mass of the particles m, what 
is the average speed of the particles? First, we need to transform the distribution from a vector 
distribution to a scalar distribution (the magnitude of the velocity might be a more interesting 
quantity than a given direction). The distribution in equation (3.46) is spherical symmetric, so 
we should transform to spherical coordinates which gives 

/ \ 3/2 2 

P(v)dv = 4?r — -— v 2 e 2k BTdv. (3.47) 



2irk B Tj 

The average speed of the particles can then be found as 



(r) i vP(v)dv = ' ls/2 J k ' il . (3.48) 
\l 7r V m 



The particles in a high temperature gas moves faster than particles in a gas with low temperature, 
as expected. The next important quantity we should compute is the mean free path. We will 
need that to calculate the Knudsen number from section 2.8. 



3.7 Mean free path 

The mean free path A is the average distance a particle travels before it collides with another 
particle. We imagine a particle with diameter d moving and find its effective collision area to 
be (see figure 3.1) 



a = vr(2(f) 2 . 



(3.49) 
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Figure 3.1: A particle with diameter d swipes out a cylinder with diameter 2d, defining the 
collision area A = ir(2d) 2 containing all points of which a collision partner can have its center 
in. 



Two particles i and j, with velocities Vj and Vj, have the relative velocity v re i = Vj — vj. The 
norm is given as 



V re \ = V v rel " V re l = yj (Vj - Vj)(vj - Vj) (3.50) 

= \/ v i ■ Vj - 2v,Vj + VjVj, (3.51) 

from which we can find the average relative velocity by assuming that the velocities are com- 
pletely random, and hence not correlated, and that the particles have the same mean speed 

(v) 



(Vrel) = V V I+ V 2 = V2(«), (3.52) 



During a time r, assuming average relative velocity v2(f), the total volume sweeped out by 
particle i is 



V = 7rd 2 V2(v)T, (3.53) 

which in turn gives the number of collisions during such a volume 

"-coll = Vp n = V2ivd 2 (v)p n T, (3.54) 

where p n is the number density. The mean free path is then calculated as the length of the path 
divided by the number of collisions 

A = - { l )T = * . (3.55) 

\/2ird 2 (v)p n T \/2Tid 2 p n 



3.8 Mean collision time 



From the mean free path, it is easy to calculate the mean collision time. The mean collision 
time t co h is simply the average time a particle travels before it collides with another particle. 
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So, if the mean free path A was average distance a particle will travel before a collision and the 
average speed of the particles were (v), the average time t co ii should be 

T "" = M = T^kM (3 ' 56) 



where we have used the expression for the average velocity (equation (3.48)). 
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We now have the theoretical foundation we need to develop the first numerical model we will 
use to study flow in nanoporous media. It is called Direct Simulation Monte Carlo (DSMC), 
and is a stochastic particle model that has showed incredible predictive power for flow in the 
high Knudsen number regime. The model was developed by G. A. Bird in 1976 and was quickly 
picked up by engineers working in the field of aerospace. In the upper atmosphere (100 km), 
the mean free path of air is several meters. For space shuttles, this gives a Knudsen number of 
order unity since the size of its nose is of order meter [1]. In the later years, the method has been 
widely used to study microflows which is our main concern in this thesis. In 1992, the model 
was proved to converge towards a solution of the Boltzmann equation (equation (3.26)) in the 
limit where the timestep At — > 0 and the number of particles M — > oo [27]. 

We start the chapter by introducing the model and its basic philosophy. The model has two two 
crucial parts, collisions between particles which is discussed in section 4.3, and how the particles 
interact with the surface. The latter is covered in section 4.2. Another important subject is 
of course how we measure physical quantities like temperature and energy. This is described 
in section 4.4. In section 4.5 we have a longer discussion about the pressure and argue that a 
DSMC gas actually satisfies the ideal gas equation of state. We also derive a relationship between 
a given pressure difference AP and a constant force allowing us to induce flow in the system 
without needing large gradients in the density or temperature. We then have a brief comment 
about the numerical stability and how the timestep and collision cell size introduce errors in 
transport coefficients. We complete the chapter by discussing how we determine whether or not 
a system has reached a steady state in section 4.7. The implementation of all these steps are 
explained in detail in chapter 5. 



4.1 The model 



DSMC is a model that makes full use of statistical mechanics and the kinetic theory of gases. 
The physical system consists of N atoms of the same type in a box with volume V = L x L y L z (Li 
being the length of the box in the z'th dimension) and porosity <fi (the porosity was the fraction 
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of the total volume available for fluids). The atoms all have mass m and an effective diameter d. 
Instead of tracking the trajectory every single atom, we simulate M particles, each representing 
N e Q real atoms. We can interpret this approximation as that all atoms in a small region of space 
around the coordinates of a simulated particle move with approximately the same velocity. The 
system is periodic in all directions which means that (L x , L y , L z ) = (0,0,0), the corners of the 
cube are the same point. If a particle passes through a boundary surface (one of the sides of 
the box), it will just enter at the opposite side, see figure 4.1. The state of a DSMC simulation 
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Figure 4.1: Periodic boundary conditions allows particles to fly out of a system by re-entering 
it in the opposite side. This reduces the amount of finite size system effects. Image from 
http://en.wikipedia.Org/wiki/File:Limiteperiodicite.svg, accessed 16 March, 2014. 

is fully described by the 6M phase variables, three velocities and three positions per simulated 
particle. 

Since we don't have detailed information about the positions of all the real atoms, we cannot 
calculate the forces between the particles. Instead, we assume that the gas particles only undergo 
binary collisions (we neglect collisions between three or more particles), just like we did while 
deriving the Boltzmann equation in section 3.4. Particles are sorted into collision cells before the 
collisions are performed in a stochastic manner, where the rate of collisions and post-collision 
velocities are determined from kinetic theory. We can think that the collision step in the model 
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is an operator, a stochastic function C(r,v, Q), where r and v form the phase space point and 
Q contains all information about the system geometry. Do not worry, this will be clear in a 
minute. 

The equations of motion are integrated by applying the standard Euler method on the positions 
so that Yi{t + At) = Yi{t) + Vi(t)At for particle i. The timestep is chosen small enough so that 
we can split it into two parts; moving and colliding. This is a reasonable assumption as long as 
the timestep At is smaller than the mean collision time r co ii (equation (3.56)) 

~ co11 \f2nd 2 p n {vY ^ ^ 

since the velocity then does not change during the timestep. If a particle interacts with a 
boundary during the timestep, some sort of surface interaction rule is applied before the timestep 
is continued (we allow a particle to collide with the surface several times during a single timestep) . 

The moving step can also be seen as a stochastic operator, A4(r, v, Q) since the surface interac- 
tion often is a stochastic process. Different surface interaction models are discussed in section 
4.2 with a detailed description of the implementation in section 5.1. Statistical properties are 
sampled at the end of each timestep where the physical quantities are sampled as time averages. 
A flow chart illustrating the steps of a typical DSMC algorithm is presented in figure 4.2. 



4.2 Surface interactions 



We remember from section 2.9 that the effects of surface interactions become significant as the 
pore sizes decrease. For very small pores, the number of atoms near the surface is comparable 
to total number of atoms. In an atomic model that calculates forces, these effects are already 
taken care of through the atomic forces, but in DSMC, we need a surface interaction model. In 
this section, we discuss three different models. The main property of these models is to perform 
a statistically correct transfer of energy and momentum between the surface and the colliding 
particles. There are two important parameters that incorporate the differences between gases 
and surfaces of various types; the normal and tangential accommodation coefficients. 



4.2.1 Accommodation coefficients 



When a particle with energy Ei hits a surface, some of the energy might be transferred to the 
wall resulting in an energy change AE. On average, we can define the normal accommodation 
coefficient 

a n = p jT, ( 4 - 2 ) 

Hii — £Li w 

where Ei is the energy of the incoming particles, E r is the energy of the outgoing particles and E w 
is the energy corresponding to the surface temperature T w . A thermal accommodation coefficient 
equal to zero would mean that there is no energy exchange, and we will get the specular wall 
model described below. a n = 1 on the other hand means that all the reflected particles have 
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Figure 4.2: Typical steps for a DSMC algorithm. 

energies corresponding to the surface temperature. This is what we call the thermal wall (or 
diffuse reflection[15]), and there is no correlation between the incoming and outgoing velocities. 
More intricate models may use other values of the accommodation coefficients so the particles 
remember their incoming velocities. We can also define the tangential momentum accommodation 
coefficient 



(ft 



Ti 



(4.3) 



where r, and r r are the incoming and outgoing tangential momentum and t w is the momentum 
of the surface (e.g. a moving surface). 



4.2.2 Specular wall 



The specular wall behaves just like how a classical mirror reflects light. The colliding particles 
are reflected so that the normal component of the velocity is reversed while the tangential 
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components remain unchanged. This leads to the famous saying; the angle of incidence equals 
the angle of reflection. Since the magnitude of all momentum components are unchanged, there 
is no exchange of energy with the wall. 



4.2.3 Thermal wall 



If we instead think of the wall as a reservoir with a given temperature T w , we can imagine that 
the particles go into the wall, collide with the wall atoms for a while, and at some point, return 
with no correlation with the incoming velocity. In principle, the outgoing particles doesn't have 
to be the same as those that go into the wall, but as long as there is no net particle flux we 
might as well assume this for simplicity. Once a particle has hit the surface, we can choose a 
new, random velocity vector from a distribution so that the reflected particles have an average 
velocity according to the wall temperature. Since faster particles collide with the wall more 
often (this would also be true if we got new particles from the reservoir), this distribution has 
to reflect this fact. A distribution that satisfies this property is the biased Maxwell-Boltzmann 
distributionjl] 

m mvl 

P n (v n )dv n = — — v n e 2k B T ^dv n , (4.4) 

Kb±w 

for the velocity component normal on the surface and 



in 



P t (v t )dv t = x l e ™ B T wdvu (4.5) 

for the tangential component. Here m is the mass of the particle and ks is Boltzmann's constant 
as usual. However, this distribution does not obey detailed balance since the incoming velocity is 
completely uncorrelated to the outgoing velocity, but it turns out that it provides great theoret- 
ical insight given its simple mathematical form. This, in addition to that it is computationally 
inexpensive (see section 5.2), it is much used in the literature. 



4.2.4 The Cercignani-Lampis model 

A more realistic model is the Cercignani-Lampis model which can be derived requiring detailed 
balance and wall isotropy (i.e. given incoming velocity v' and outgoing velocity v, P(v' — > v) 
is unchanged if v' and v are rotated with the same angle about the surface normal vector) [8]. 
The probability of going from an incoming velocity v' to an outgoing velocity v is given as 



P(v' -> v) 



2a n a t (2 - <r t )fi 



IT 

v 2 n + (l-a n )(v' n ) 2 (v t -(l- at )v' t ) 2 



x exp - - f3 w , 

&n 0- t (2 - G t ) 

xIo[Pl 2VT ^ tVn< \ (4.6) 

0~n ' 
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where v n and vt are the normal and tangential components of the velocities, Iq is the zeroth-order 
modified Bessel function of the first kind and j3 w = (feeT^,) -1 . a n and at are the accommodation 
coefficients discussed in subsection 4.2.1. We see that the tangential component is a normal 
distribution with a non-zero mean (the particles remember their incoming velocity), whereas 
the normal component is more complicated. The normal component distribution is plotted in 
figure 4.3. 



Cercignani-Lampis normal component distribution 
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Figure 4.3: The Cercignani-Lampis normal component distribution for T w = 100 K, m = 
39.948 u (argon), a n = 0.5. We see that particles with high velocities are on average reflected 
with a slightly lower velocity, converging towards the velocity corresponding to the wall temper- 
ature T w . The mean velocity for this temperature is (v) =144 m/s. 



To draw random numbers from this distribution is orders of magnitudes slower than that of 
the thermal wall, since it isn't trivial (if even possible) to invert the cumulative distribution 
function. Instead we must use the von Neumann algorithm which is a accept-reject Monte 
Carlo algorithm[3]. Our main focus in this thesis is to validate the DSMC model by comparing 
it to theoretical results of which most have used the thermal wall. The Cercignani-Lampis 
model is available in the DSMC-code, but due to its computational cost and the low number of 
comparable theoretical results, we have used the thermal wall in all of our simulations. 
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4.3 Collisions 



In a particle simulation with a continuous force field it is not clear how one would define a 
collision event. If two equal atoms with the same velocity move towards each other, the atoms 
would at some point reverse their velocities. In this case, one could define the collision event 
to occur at the time of which their relative distance is at its minimum, but other, equally valid, 
definitions probably exists. It is however clear that a collision should be identified as an event 
that happens when the atoms are close, i.e. short ranged forces. 

As we already have mentioned, we don't operate with forces in the DSMC model. We calculate 
collision rates from the kinetic theory. In order to do so, we do need to choose an underlying 
collision model from which we will calculate the collision rates. We have chosen the hard sphere 
model, where each particle is assumed to be a perfect hard sphere with diameter d and mass 
m. Hard sphere means that two particles with radius R\ and i?2 will undergo an fully elastic 
collision if their relative distance equals the sum of their radii, see figure 4.4. In DSMC, we will 
then apply what we could call a stochastic hard sphere collision model, where we use the hard 
sphere model only to calculate the statistical collision rates. 




Figure 4.4: The hard sphere collision model. Two particles will collide if their relative distance 
becomes smaller than R\ + R2. 



Since collisions should occur to nearby particles only, we sort the particles into spatial cells, 
allowing only particles from the same cell to collide. The dimension of these cells should not 
exceed the mean free path. If this was the case, two particles displaced by a distance larger than 
the mean free path could transfer momentum or energy faster than what would happen in a real 
gas (collisions usually transfer energy and momentum). Note that we allow particles moving 
away from each other to collide, since the simulated particles should not be interpreted as real 
molecules or atoms. In some sense, they are quasi-particles carrying statistical information only. 
They can be interpreted as density packets representing the distribution function / from section 
3.1. Now that we have chosen a collision model, we should compute the collision rates. 
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4.3.1 Number of collisions 



We will here use similar arguments as we did deriving the mean free path in section 3.7. If we 
choose two particles, i and j, with relative velocity v r , each representing N e g real atoms with 
effection collision area A = ird 2 (see section 3.7), in a collision cell with volume V c , the total 
volume sweeped out during a timestep is 

V sweep = N e{i ird 2 v r At. (4.7) 

The probability that they will collide is the total sweeped volume V sweep divided by the total 
cell volume V c 

Poo,, = (4.8) 

If a collision cell has N c particles, a particle has (N c — 1) potential collision partners. So the total 
number of collision pairs (we divide by two to prevent double counting of pairs) is N C (N C — l)/2 
which gives the number of collisions M co \\ 

N C (N C - 1) N c (N c -l)N eS 7rd 2 (v r )At 
Moon = noli = ^7 , (4.9) 

where we replaced the relative velocity v r by the mean value in the cell 

But computing the mean relative velocity (v) in each cell every timestep sounds like a horrendous 
thing to do. It requires to sum over all pairs which is 0(N 2 ), which is exactly what we try to 
avoid in the first place. But we can do a little trick. Instead, we calculate M can( j candidate pairs 
so that 

McoiL = M_ f4 n . 

M cand t; r max ' 

since the probability of collision is proportional to the relative velocity. Each of these candidates 
go through an acceptance-rejection process where we pick a uniform random number TZi £ (0, 1) 
and accept the collision if 

v r < V^Kl (4.12) 

This will only accept (iv)/w™ ax of the candidates and we end up with M co \\ actual collisions, as 
desired. The number of candidate pairs is then computed as 

M,„ = W - i y^ A ' . (4.13) 

If we choose v™ ax very high, we will still perform the correct amount of collisions, but the number 
of rejected collisions would be high and hence the algorithm is inefficient. If a collision pair has 
a higher relative velocity, we simply update this variable (in that cell). 

We should one more time mention that particles moving away from each other can collide. 
This property has, as we will see in section 4.5.1, an interesting consequence leading to the 
ideal gas equation of state. One more thing, we haven't figured out how to perform the actual 
collisions. Until now, we know only how to select collision pairs, so let's calculate the post- 
collision velocities. 
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4.3.2 Post-collision velocities 

After a collision is accepted, we want to choose new velocities conserving both energy and 
momentum. We need a total of six equations to determine the post-collision velocities, where 
four are provided through the conservation laws. Conservation of momentum reveals that the 
center of mass velocity is unchanged 

Vcm = g (v< + Vj) = g K* + v*) = v* m , (4. 14) 

where the energy conservation tells us that the relative velocity vector does not change its 
magnitude 

Vr = | Vi - Vj | = | v* - v*| =v*. (4. 15) 

Here we used that the velocities of the particles are uncorrelated so that Vj • Vj = 0 on average. 
The two remaining degrees of freedom are determined by choosing the direction of the relative 
velocity 

v* = v r [(sin#cos</>)i + (sin0sin0)j + (cos0)k] , (4-16) 

where the angles are uniformly distributed over the unit sphere so that all directions for the 
relative velocity are equally probable. The area element df2 can be written as 

dfl = sin 9 d6 d(f> = - d<p d(cos 9), (4.17) 

so we need to choose 4> and cos 6 uniformly. This is easy, we simply choose 

4> = 2-kTZ 2 cos0 = 27e 3 -l, 

where TI2 and Tl^ are random numbers in the range (0, 1) and calculate sin 6 = y/l — cos 2 6. 
The post-collisions velocities are then found by 

v* = v cm + ^ (4.18) 

v*=v cm -^. (4.19) 

4.4 Measuring physical quantities 

The model, or shall we say, the simulator is in principle fully described. The particles move and 
perform collisions with the surface according to some interaction rule. Then the particles will 
collide with each other. This process goes on an on until we are satisfied or out of computing 
time. Within this framework, statistical mechanics happens and particles behave as they should 
(the model solves the Boltzmann equation). But there is no reason to have a simulator if we're 
not going to use it to learn physics. 

The simulator will take the system from some initial state and guide it through the phase space. 
This was what the ergodicity hypothesis allows us to do (see section 3.3). We can evolve the 
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system through time and visit the phase space with probabilities equal to those of the ensemble. 
The DSMC model is inherently stochastic, so any physical quantity should be computed by 
averaging many instantaneous measurements. We should assume that there will occur gradients 
of the physical quantities (for example gradients in density and temperature), so we should 
calculate local values in the collision cells. In a typical collision cell, there will be maybe ten 
to a hundred particles, so the instantaneous values will fluctuate significantly. But as we know 
from statistics, if the system is in equilibrium, the fluctuations (here the standard deviation) 
in e.g. the energy or temperature will decrease as l/y/N m if measured N m times assuming 
that the measurements are uncorrelated. The latter requirement can be obtained by measuring 
every nth timestep. We can measure the correlation between two states through the velocity 
autocorrelation function given as 

Cv(t) ~ <v(0)v(0)>* (4 ' 20) 
_ 1 ££=iV„(t)-v B (0) (421) 



which is equal to one at t = 0, and decays with time as the system becomes more uncorrelated 
with the initial state. We should then measure physical quantities with a time interval corre- 
sponding to the time where the velocity autocorrelation function has become more or less zero. 
We will now quickly discuss how to measure the physical quantities we will use in our analysis 
later on. 



4.4.1 Energy 

The total energy of a system is as usual given by the sum of the kinetic and potential energy. 
Since we are using the hard sphere model, the potential energy is given as 

%r 2 ) = f° •?, |ri ~ r5 j l < rf d (4-22) 
[ oo it |ri — r2\ < a, 

where collisions will make sure that the relative distance between any particle pair always remains 
larger than the diameter. The total energy of our entire system will then only be the kinetic 
energy 

N 1 

E = E k = Y, -™ n vl (4.23) 

n=l Z 

where m n is the mass of particle n and v n is its scalar velocity. An example implementation 
of how the instantaneous kinetic energy is calculated is given in listing 4.1. Remember that in 
DSMC, each particle represents a given number of real atoms. 

double c a I c u I a t e _ k i n e t i c _ e n e rgy ( vecto r <Vector3 > & v e I o c i t i e s ) { 
double k i n et i c _ e n e rgy = 0; 
for ( int n=0; n< v e I o c i t i e s . s i z e ( ) ; n++) { 
Vector3 velocity = v e I o c i t i e s . at ( n ) ; 

kinetic_energy += 0.5*mass*atoms_per_particle*velocity . NormSquared () ; 
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} 

return k i n et i c _ e n e rgy ; 

} 

Listing 4.1: 



Calculation of kinetic energy. 



Once we have found the kinetic energy, we can easily compute the temperature. 



4.4.2 Temperature 

The temperature is defined through the equipartition theorem using the three momentum degrees 
of freedom 

(E k ) = ^Nk B T, (4.24) 

where (Ek) is the average kinetic energy, N is the number of particles, k B is Boltzmann's constant 
and T is the temperature. The only unknown quantity in this equation is the temperature 

- 2Ek (4.25) 



2,Nk B 



where we have dropped the average value brackets of the kinetic energy because we use this to 
define the instantaneous temperature. Note that if the fluid is flowing (the gas has non-zero 
average velocity), the numerical values of the particle's velocities are higher, which in turn results 
in higher measured temperatures. But of course, this has to be wrong, the temperature should 
not depend on the choice of frame of reference. Imagine a bacteria swimming in the flowing 
fluid, the temperature it feels is proportional to the average kinetic energy compared to the 
local frame of reference. This indicates that we should define a instantaneous local temperature 
T(r,t) which we define as 



m / \ 2m 



E(r,t) 1 /p(r,t) 



p(r,t) 2 \p(r,t 



(4.26) 



where E(r,t), p(r,i) and p(r,i) is the average kinetic energy, density and momentum within 
some volume around the point r. This is of course still just the equipartition theorem where we 
measure the kinetic energy in the frame of reference determined by the fluid around the point r. 
We usually use the collision cells to compute these local values. Now we need to calculate the 
density. 



4.4.3 Density 

Here we should comment on another detail, a consequence of our intermolecular collision model. 
Since there are no forces between the particles, all of them can in principle be at the very same 
point (remember that we used the hard sphere collision model only to calculate the collision 
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rates, not to detect collisions). This will of course not happen, but it is possible to initiate a 
state in that configuration. The number density p n in any volume V is easily calculated through 

N 

Pn = y, (4.27) 

where N is the number of atoms in that volume. This enables us to calculate local densities as 
well as the global density of the system. Again we must not forget that each simulated particle 
represents N e Q real atoms. 



4.4.4 Permeability 

The permeability k is defined through Darcy's law (equation (2.7)) which we discussed in section 
2.4 

k = ^, (4-28) 

where L is the length of the system in the flow direction, /j, is the viscosity, Q is the volumetric 
flow rate, A is the cross sectional area, AP = Pq — Pl are the pressures at x = 0 and x = L. 
The viscosity can be computed with the kinetic theory [2] 



Measuring the permeability then introduces to problems we need to figure out how to solve. The 
first is how we measure the volumetric flow rate. As the name indicates, it is a measure of how 
many units of volume passes through a surface per unit time. In DSMC, we will measure this 
by counting how many particles that undergo a periodic boundary condition shift in the flow 
direction, this is the number flow rate. Assuming we have N particles, each representing N e s 
real atoms in a system with volume V, the volume per particle v is given as 

V 



The volumetric flow rate Q is then simply the number flow rate multiplied by the volume 
per particle. The next problem is that the systems we will study are periodic in the flow 
direction. This implies that the point x = 0 actually is the same point as x = L, which gives 
P(x = 0) = P(x = L). Hence, the pressure difference is zero, no matter how we measure the 
pressure. In the next section, we will have a rather comprehensive discussion about pressure 
and find that a constant acceleration g can be related to a pressure difference AP as 

where m is the mass of an atom, p n is the number density and Ax is the distance between the 
two points of the pressure difference, usually the system length L. The permeability is then 
found as 

k=^^, (4.32) 
which is how we will measure the permeability. 
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4.5 Pressure 



Pressure plays an important role in the field of fluid flow. An applied pressure difference (which 
gives a net force) is usually what induces the flow, in addition to being an important property 
of the fluid. The discussion about pressure contains both how we define the pressure, and we 
measure it in a DSMC simulation. And of course how we induce flow in the simulation. It turns 
out that DSMC satisfies the ideal gas equation of state, so given constant temperature, the local 
pressure is proportional to the local density. A pressure difference, or pressure gradient, would 
then require a similar gradient in the density. This is difficult to obtain in a system with periodic 
boundary conditions because p(x = 0) = p{x = L) for a system of length L. Instead we will 
derive a relation between the pressure gradient and a corresponding constant force which we will 
use to induce flow in the simulations. 



4.5.1 Equation of state 

The free modules in a DSMC program are the collision operator C and the move operator 
A4 which fully (stochastically) determine the time evolution of the system. For systems with 
pairwise interactions (such as hard spheres or the Lennard- Jones potential which we meet in 
section 7.1), the pressure may be defined as (see appendix C for a derivation) 



P = Pn k B T + -L/ ^ F(ry) • rA (4.33) 



i<j 

where the first term is the ideal gas pressure whereas the second term is called the virial of the 
pressure. Here F(ry) is the force between particle i and j, and is their relative distance. In 
DSMC we don't have the details about the forces, but we can formulate a similar expression 
using that the force is the change in momentum per time 

P = p n k B T + — Yl rnAvij-rij, (4.34) 

all collisions 

where Avjj is the change of velocity of one of the particles during a collision [13]. In the collision 
model we have used, there are no correlation between the change in velocity Avjj and the 
displacement vector between the particles 

(Avy • v tj ) = 0, (4.35) 

so the expression for the pressure is reduced to that of an ideal gas 

P = p n k B T. (4.36) 



Since the main focus of this thesis is to study dilute gases where the ideal gas is a good ap- 
proximation, this collision model is sufficient. For dense gases, or liquids, it is possible to apply 
collision models that yields other equations of state [13]. 
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4.5.2 Measuring pressure 

Since the gas satisfies the ideal gas equation of state, this is of course how we measure the 
pressure 

P = p n k B T, (4.37) 

since we already know how to calculate the density and the temperature. The local pressure is 
of course found by using the local values of the density and temperature. 



4.5.3 Applying a pressure gradient 

In order to induce flow in a system, it is common to apply a pressure gradient. A pressure 
gradient means that there acts a nonzero net force on any volume element dV in the system. In 
continuum models like the NSE (see section 2.2), the pressure (and hence the pressure gradients) 
is incorporated as boundary conditions where pressure is specified at given points. A typical 
boundary condition is P{x = 0) = Pq and P(x = L) = Pl, but as we already mentioned, 
periodic boundary conditions is a problem since the points are the very same point. Instead we 
will use ideas from continuum mechanics to relate a given pressure gradient to a constant force 
which we will apply on all particles in the system. In the literature, this is often called gravity 
driven flow. 

We look at a volume element of size Ay = AxAyAz in a channel with a continuous fluid and a 
pressure gradient in the x-direction, see figure 4.5. The net force acting on the volume element 
in the x— direction is 

F = P 2 AyAz - PxAyAz = AyAzAP, (4.38) 

where AP = P 2 — P\. We aim to find a constant force F = rag being equivalent to that of the 
pressure difference. Given an acceleration g, the force is then 

F = mg = Pm AVg. (4.39) 

We aim to find a force equal to the one from the pressure difference 

AP 

F = AyAzAP = AV— — , (4.40) 
Ax 



which gives the relation 



AP 

9 p m Ax ' 



In simple geometries like a tube, the behavior of the flow for both pressure models should be 
similar. However, for disordered systems with regions that can trap particles, we can expect 
some effects that will affect the fluid flow in a non-physical way. An example is shown in figure 
4.6 where a gas driven by a constant acceleration in the x-direction will be slowed down in the 
area marked gray. In a gas driven by a real pressure difference, we expect a net force along the 
channel in all regions. We can now obtain a desired pressure difference through equation (4.41) 
and apply that acceleration to all particles each timestep. 
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Figure 4.5: The net force acting on the volume element AV = AxAyAz in the x— direction is 
given by the pressure difference times area A(P2 — Pi) = AyAzAP. 



4.6 Numerical stability and discretization error 



Most numerical methods have a critical stability criterion where the energy or some other prop- 
erty might diverge if the timestep is too large. For example, while solving PDE's with a finite 
difference scheme, we often encounter the Courant number which is a critical threshold of the 
ratio of the discretization length of space and time. For the one-dimensional wave equation, this 
can be expressed as 

C = W™ Ax < Cmax , (4.42) 

where |x| max is the magnitude of the velocity. If the spatial grid has high resolution, small Ax, 
we need a similarly small timestep At. 

However, since the DSMC model always conserves energy and momentum (during particle col- 
lisions), the method is in principle numerically stable for any timestep. As mentioned in section 
4.1, the timestep is split into two parts; moving and colliding. The timestep should therefore be 
smaller than the mean collision time. Larger timesteps may result in large errors in the transport 
coefficients (such as viscosity and thermal conductivity) [15]. While the timestep is compared to 
the mean collision time t co ii, the collision cell size can be seen as the spatial discretization, and 
be compared to the mean free path A. 
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Figure 4.6: Flow induced by a constant acceleration will not reproduce correct flow behavior 
when the gas in a larger part (marked gray) of the channel flows in the opposite direction of 
the force. In a real pressure-driven flow, the net force on the gas will point along the expected 
flow direction, also in the gray marked area, whereas the acceleration-driven flow will be slowed 
down. 

4.6.1 Finite cell size 

The collision cells allows all particles within a cell to collide with each other. So if the cell size 
is very large, particles from a hot region (in one corner of the collision cell) may collide with 
particles in a colder region (maybe in another corner) that are displaced by a large distance. 
This could enable heat to transfer much faster than it would in a real gas. The cell size L ce \\ 
should therefore at least be smaller than the mean free path[15]. The viscosity can be calculated 
from kinetic theory 



which Garcia et al. [2] used to show that the error in the viscosity has a quadratic dependency 
of the cell size 



If the length of the collision cells equals the mean free path, we could then expect a 10% error 
in the viscosity coefficient. 

4.6.2 Finite timestep 




(4.43) 




(4.44) 



A large timestep may allow particles to travel through several collision cells during a single 
timestep. This would allow information to travel faster than in a real gas and also leads to 
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errors in transport coefficients like the viscosity. Hadjiconstantinou [14] derived an expression 
for the timestep dependency for the viscosity, similar to equation (4.44) 
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(4.45) 



where v m = y^2kB/mT is the most probable velocity. We see that the error is proportional to 
(v m AT/X) 2 which vanishes in the limit At — > 0. 



4.7 Reaching a steady state 



Since we want to study flow in nanoporous media, before inducing the flow, the fluid is on 
average obviously at rest. Immediately after we have started applying the constant force that 
will make the fluid flow, the fluid velocity is still approximately zero. After a certain amount of 
time, the system will reach a steady state which in its most simple form can be defined as when 
the time derivative of the fluid velocity in any region, the local velocity, is zero. We should not 
start to sample flow statistics like the permeability until such a state has been reached. However, 
the system may not be in a steady state even though the average local fluid velocity does not 
change over time. There are other physical quantities like that may still be changing. 

A naive, but simple approach to measure whether or not the fluid velocity has converged is 
to look at the measured temperature defined in equation (4.25). If the gas temperature starts 
out at T =300 K before the flow is induced, the measured temperature will increase while the 
fluid velocity increases. Once the fluid has reached a steady state, the temperature will have 
converged to some value it will continue fluctuating around. For simplicity, this is how we have 
determined whether or not the system has reached the steady state. In future development of 
the code, better methods should be implemented. 
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In this chapter we go into detail about how the DSMC model is implemented in C++. We 
assume that the reader is familiar with the programming language. Instead of going through all 
the classes and their relations, we follow the time line of a simulation, going from the initialization 
of the system to how the timestep is computed. For convenience, an UML diagram is shown in 
figure 5.1. 



StalisticsSampler 



void sampleQ; 



vector<int> particle, indices; 
double volume; 



int collide_particles{} 



vector<double> positions; 
vector<double> velocities; 
vector<Cell> cells; 

Mover 'mover; 

void step(); 

void apply_constant_force(); 
void move{); 

void update_particle..cells(); 
vdo^ollide(2^^^^^_^^_ 



System 



SurfaceCollider 



' FILE *energy_file; 
FILE 'permeability _file; 



void save_state_to_file_binary(); 

void load_state_from_file_binary(); 

void re ad_grid_matrix (string filename, Grid 'grid); 



SurfaceCollider "collider; 
Grid 'world_grid; 



void mov e particle(i nt particle index); 



double surface temperature; 



void collide(G rid Point *grid_point, int particle_index); 



vector<int>voxel_value; 
vector<float>normal_vectors; 
vector<float> tangent_vectors_1 ; 
vector<float>tangent_vectors_2; 



int voxel index Irom position (vector<double> position] 



Figure 5.1: A UML-diagram showing how the classes in the DSMC program are related to each 
other. 

The first step of the full program is to create the geometry we want the particles to be confined by. 
We explain how that is done in section 5.1. Then, we can initialize a system by adding particles 
randomly inside the geometry. Their velocities should be Maxwell-Boltzmann distributed, as 
described in section 3.46. The initialization process is explained in section 5.2 before we are 
ready to perform the timesteps. A timestep is divided into four stages. These stages are 



• accelerate particles to induce flow, 

• move particles and perform surface interactions, 
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• update collision cells, 

• perform collisions between particles, 

which all are discussed in section 5.3. This is everything we need to run a DSMC program in any 
geometry we want. The only thing left to explain is how we have parallelized the code, allowing 
it to run on many processors on a supercomputer. This final piece is explained in section 5.4. 
The full code is available, on demand, by contacting the author at anderhaf@fys.uio.no. 



5.1 Complex geometries 

All the surface interaction models from section 4.2 use the surface normal and tangent vectors 
to calculate the reflected velocities. These vectors are easy to determine if the system consists of 
two parallel plates in the xy-plane, or any other mathematically well described geometry. Such 
systems are interesting as validation test cases, but most real world materials have a more com- 
plex geometry without any simple mathematical description. A very much used representation 
of such geometries is a triangle mesh in which the surface consists of many connected triangles. 
The triangles have a well defined normal vector and tangent plane which is easy to calculate. 
With this method, collision detection is done by checking intersection with each triangle and is 
rather computationally expensive. In this thesis, we have chosen another approach by represent- 
ing the system as a large, binary three-dimensional matrix consisting of voxels, each having the 
value filled or empty. With this model, collision detection is done by a quick memory lookup to 
check if the voxel corresponding to the position of a particle is filled or not. In this section we 
discuss how to create such a matrix, how to identify the surface voxels and how we calculate the 
normal and tangent vectors. 

5.1.1 Binary representation 

Any system geometry is fully described by a three dimensional matrix with dimensions mxnxl. 
Each matrix element represents a voxel in the physical space, and can take values 0 or 1. A 
value of one means that the voxel is filled, whereas zero means the voxel is empty. Since the 
particles only will collide directly with the voxels defining the surface, these are given the value 
2. The idea is best explained with an example. 

5.1.2 Example - a cylinder 

We will now show how to create a cylinder with radius r with this model. The idea is very simple, 
we just loop through every voxel in the system and check whether or not the voxel should be 
marked as filled or empty. We define that no matter how many voxels we have, the radius r is 
given in the range [0, 0.5] so that the system is a 1 x 1 x 1 cube. The algorithm should be easy 
to understand from the code example in listing 5.1. 
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void c r e a t e _ cy I i n d e r ( double radius) { 

float voxel_size_x = 1.0 / num voxels x : 
float voxel_size_y = 1.0 / num voxels y : 

double cy I i n d e r _ ce n te r _ x = voxel_size_x 
double cy I i n d e r _ ce n te r _ y = voxel_size_y 



( num_voxels_x / 2.0); 
( num_voxels_y / 2.0); 



for(int i =0; i <num_voxels_x ; i++) { 

for(int j =0; j <num_voxels_y ; j ++) { 

for(int k = 0; k<num_voxels_z ; k++) { 
double x = i /double ( n um _voxels_x ) 
double y = j /double ( n um _voxels_y ) 
bool is solid = true; 



voxel_size_x / 2.0; 
voxel_size_y / 2.0; 



double dx = (x — cy I i n d e r _ ce n t e r _ x 
double dy = (y — cy I i n d e r _ ce n t e r _ y 
double dr2 = dx*dx + dy*dy; 

if ( d r 2 < r a d i u s * r a d i u s ) { 
i s _ wa 1 1 = false ; 

} 



int index 



i * ny * nz 



j * nz 



if(is_wall) v e r t i c e s [ i n d ex 
else v e r t i c e s [ i n d ex ] = 0; 



f- k; 
= 1; 



} 

calculate_normals_tangents_and_inner_points() 



Listing 5.1: An example showing how to create a cylinder. 



Notice the last line there, the function calculate_ normals _ tangents _ and_ inner _ points (). After 
all voxels are marked as filled or empty, we need to identify the surface voxels and calculate their 
tangent and normal vectors. 



5.1.3 Identifying the surface voxels 



A voxel that is filled, but not part of the surface, will be completely surrounded by filled voxels. 
We define the surface voxels as the filled voxels that have less than 26 neighboring filled voxels. 
The algorithm can be implemented like in listing 5.2 (we also need to take care of the periodic 
boundary conditions, but the idea is best illustrated without that complication). 

bool i s _ s u rf a c e ( int voxe I _ i n dex _ i , int voxel _ i ndex_j , int voxel _ i nd ex_ k ) { 
for (int i =-l;i <=1; i++) { 

for (int j=-l;j < = l;j++) { 
for (int k = -l;k< = l;k++) { 
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} 



// Skip self 

if(i = j = k = 0) continue; 

i f ( wo r I d _ ma t r ix [ voxe I _ i n d ex _ i + i ] [ voxe I _ i n d ex _ j + j 
voxel _ i n dex_ k + k] = 0) { 

// This neighbour is empty, hence a surface voxel 
return true ; 

} 



} 



} 



} 

return false ; 



Listing 5.2: An example showing how to identify the surface voxels. The world_matrix contains 
all the voxel values (zeros and ones). 



This has to be done for every voxel in the system, but only once per geometry. When we have 
identified all surface voxels, we need to calculate the normal and tangent vectors for each of 
them. Once this is done, we can save and use the geometry in a simulation. 



5.1.4 Calculating normal and tangent vectors 

Each surface voxel is a cube with six faces, each having a normal vector pointing out from the 
cube. Each face then defines the tangent plane orthogonal to this normal vector. When a particle 
collides with a surface voxel, we can calculate which face the particle passed through and use 
those vectors to calculate the reflected velocity. However, in this thesis, we have developed a 
new way of describing the surface vectors. We will use the neighboring voxels so that the normal 
vector of a surface voxel contains information of the curvature of the surface in order to have 
more realistic collisions. 

The idea is simpler to illustrate for a two dimensional square, but the concept can easily be 
generalized to three dimensions. A square consisting of 9 pixels has a geometric center r gc , plus 
a center of mass fern which can be defined by the values, the mass, of the voxels 

r cm = ^ 1 i 

where rriij £ {1,0} and rjj is the coordinate of the voxel with r gc as the origin. We define the 
normal vector to be the normalized local center of mass vector 

Tern 

n = | 

I Tern 

In figure 5.2, we have computed the normal vector for four different voxel configurations where 
we see that the direction of the normal vectors seems reasonable. In the three dimensional 
case, we find the normal vectors in exactly the same way, but by using the 26 neighbors. The 
only thing we need to calculate are the tangent vectors. These can be found by first choosing 



(5.2) 
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Figure 5.2: Four different pixels configurations in a 3 x 3 grid. The filled pixels are marked 
black whereas the empty pixels are white. We compute the normal vector of the center surface 
pixels (marked gray) based in its neighbors following equation (5.2). We see that this algorithm 
generates normal vectors that behave as expected. 

a random vector v and apply the Gram-Schmidt process making it orthogonal on the normal 
vector so that 

ti = v — (n • v)n, (5.3) 
which gives the normalized tangent vectors 

ltll 

t 2 = n x ti. (5.5) 

The porosity (j) is found by counting the number of empty voxels divided by the total number 
of voxels 



(5.6) 
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5.2 System initialization 

Now that we know how the geometry is represented, we are ready to discuss the DSMC simulator 
itself. When we want to run a simulation in a given geometry, we need to decide a few things. 
We need to specify 

• the density (p n ), 

• the temperature (To), 

• the physical system size (L x ,L y ,L z ), and 

• the number of atoms each simulated particle represents (iV e g). 

These are needed input parameters that will affect the initialization process. The first step when 
the program starts is to load the geometry from disk. We then create the collision cells, before 
we add all the particles to the system. 



5.2.1 The geometry 

We remember that everything we need to know about the geometry are the voxel matrix of size 
N x N y N z , a normal vector and two tangent vectors per voxel. This data is saved in a class Grid 
where the geometry data is saved in four variables and lookup functions as shown in listing 

typedef enura { 

voxel _type_em pty = 0, 

voxel _ type _ wa 1 1 = 1, 

voxel _ ty pe _ bo u nd a ry = 2 
} voxel _ty pe ; 

class Grid 
{ 

private : 

int nx , ny , nz ; // Number of voxels 
vector <unsigned char> voxels; 
vector <Vector3> normals; 
vector <Vector3> tangentsl; 
vector <Vector3> tangents2; 
public : 

unsigned char &get _ voxe I ( const int &i , const int &j , const int &k) { 
return voxe I s [ i * ny* nz + j*nz + k ] ; 

} 

unsigned char &get_ voxel ( Vector3 &position) { 
int i = nx * (position. x / system _size . x) ; 
int j = ny * (position. y / sy ste m _ si ze . y ) ; 
int k = nz * (position. z / sy ste m _ si ze . z ) ; 

return get_voxel(i,j,k); 
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} 



// The other three are similar 



} 



Listing 5.3: A Grid class example. This class contains everything we need to know about the 
geometry. 

The first part defines the different values a voxel can have, empty, wall and boundary. The 
voxel values and associated surface vectors are stored in std: : vector objects in a linear form 
(an array with one index). This means that given the three voxel coordinates (i,j,k), they are 
mapped onto one index as shown in the code example. 

5.2.2 Collision cells 

As we remember from section 4.3, we will perform collision between particles that are in the 
same collision cell only. We haven't really talked about what such a collision cell is yet, except 
that its size should be smaller than one third of the mean free path. The simplest way to create 
these cells is to just divide the total system volume into smaller boxes of equal size. Then each 
of these cells should have control over which particles that are inside the volume it represents 
(or owns if you like). Then each timestep, a number of collisions is performed in each cell. This 
number was found to be (equation (4.9)) 



We see that one of the factors is the cell volume V c . But some of the cells may have large parts 
of their volume unavailable for fluids, they have a local porosity. This is okay, we just need to 
loop through all of the voxels within a cell and compute the local porosity. An example of how 
this can be done is shown in listing 5.4. 

void c r e a t e _ c e 1 1 s ( ) { 

for(int k = 0;k<grid . nz ; k++) { 

int c_z = float ( k ) / g r i d — >nz * ce 1 1 s _ z ; 
for(int j =0;j<grid . ny ; j++) { 

int c_y = float ( j ) / g r i d — >ny * ce 1 1 s _ y ; 



M coU 



N C (N C - l)N eS ird 2 (v r )At 
2V C 



for(int i =0; i <g r i d . nx ; i ++) { 



int c_x = float ( i ) / g r i d — >nx* ce 1 1 s _ x ; 

// Find the one— dimensional cell index 

int cell_index = ce 1 1 _ i n d ex _ f ro m _ ij k ( c_x , c_y , c_z ) ; 

// Count both the total number of voxels 

// and the number of empty voxels 

Cell &cell = c e I I s . a t ( ce 1 1 _ i n d ex ) ; 

cell . tota I _ voxe I s++; 

cell . em pty _ voxels += world _gr id — >get _ voxe I ( i ,j ,k)< 



voxel _ty pe_ wa 1 1 ; 



} 



} 
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} 

for(int i =0; i < c e I I s . s i z e ( ) ; i++) { 
Cell &ce 1 1 = cells, at ( i ) ; 

double cell_porosity = cell.empty_voxels / cell.total_voxels; 
c e I I . p o r os i t y = ce 1 1 _ p o r o s i t y ; // Set porosity 
cell. volume *= ce 1 1 _ p o r o s i t y ; // Update volume 

} 

} 

Listing 5.4: Example code showing how to find porosity and volume of the collision cells. 



5.2.3 Particles 



Assuming that we have created the grid object, filled in the voxel array and created all the 
collision cells, we are ready to create all the particles. As mentioned earlier in this section, 
we need to specify the system size (L x , L y , L z ). The total system volume is then of course 
found as V^stem = L x L y L z . But we are going to study a system with a given porosity, the 
number of volume available to the fluid divided by the total volume. The porosity was found 
in equation (5.6) by counting all the empty voxels. The volume available for fluid is then 
V = <j)V sys t era = (f)L x L y L z , which combined with the density p n gives us the total number of 
atoms in the system 

Natoms = PnV. (5.7) 

But we are going to create M particles, each representing N e g real atoms, so the total number 
of particles in our system becomes 

t,*- -^atoms PnV /r - Q \ 

M = ^T = ^- (5 - 8) 

Each of these particles is assigned a random position in the physical space. If the particle is 
placed inside a wall, then we find a new, random position until it is safely placed inside the 
available pore space. Each particle gets a velocity according to the input temperature To where 
each velocity component is a normal distribution with standard deviation a v = \Jkj^J\)Jrn. Once 
we have found a position for the particle, we need to add it to the collision cell corresponding 
to its position. An example code showing how this is done is found in listing 5.5. 



void i n i t i a I i z e _ p a r t i c I e s ( ) { 

double system _ vol u me = system _size . x * sy ste m _ s i ze . y * system _size . z ; 
int n u m _ pa rt i c I es = d e n s i t y * vol u me* p o r os i t y / n u m _ atoms_ per_ pa rt icle ; 
double ve I oc i ty _ st a n d a r d _ d e v i a t i o n = sq r t ( bol tzm a n n _ co nsta n t * t e m pe r a t u re 
/ mass ) ; 

for(int index=0; i n dex <n u m _ pa rt i cl es ; index++) { 

// First, assign velocities from the Maxwell— Bolt zmann distribution 
v e I o c i t i e s . a t ( i n d ex ) . x = r nd . n ext G a u ss ( ) * v e I oc i t y _ st a n d a rd _ d e v i a t i o n ; 
v e I o c i t i e s . a t ( i n d ex ) . y = r nd . n ext G a u ss ( ) * v e I oc i t y _ st a n d a rd _ d e v i a t i o n ; 
v e I o c i t i e s . a t ( i n d ex ) . z = r nd . n ext G a u ss ( ) * v e I oc i t y _ st a n d a rd _ d e v i a t i o n ; 
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find_position (index) ; 

// Find the collision cell and add the particle 
int cell_index = ce 1 1 _ i n d ex _ f ro m _ pos i t i o n ( p o s i t i o n s . a t ( i n d ex 
Cell &cell = c e 1 1 s . at ( ce 1 1 _ i n d ex ) ; 
cell .add_particle(index); 

} 

void f i n d _ pos i t i o n ( const int &index) { 

// Assume that the particle is inside the wall 
// until proven otherwise 

bool i s _ i n s i d e _ wa 1 1 = true; 

Vector3 &position = p o s i t i o n s . a t ( i n d ex ) ; 

while ( i s _ i n s i d e _ wa 1 1 ) { 

position. x = sy st e m _ si ze . x* rnd — >next _ do u b le ( ) 
position. y = sy st e m _ si ze . y * rnd — >n ext _ do u b le ( ) 
position. z = syste m _ si ze . z* rnd — >next _ do u b le ( ) 

// Check if this voxel has value equal to voxel _type _wall or 

voxel _ type_ surface 
is_inside_wall = world_grid — >get _ voxel ( position )>=voxe I _type_wall 



} 

} 



Listing 5.5: Particle initialization. 



This method will uniformly distribute all the particles in the available pore space. We are now 
ready to perform the timesteps. 



5.3 Timestep 



The idea of a timestep is to evolve the system a small amount of time At. During this process, the 
particles are first accelerated before they are moved according to their velocity. If the particles 
collide with the surface, we will pick a new velocity before they finish the timestep (which may 
contain many surface collisions). When they have reached their final position, we update the 
collision cells before particle collisions are performed in each cell. We will now explain each of 
these stages. 



5.3.1 Acceleration 



The magnitude of the acceleration is determined by the choice of pressure difference in equation 
(4.41). The velocity is determined using forward Euler 

Vj(t + Ai) = Vj(t) + aAt, (5.9) 

for every particle i. 
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5.3.2 Moving and surface interactions 

All our particles now have some velocity, and we can in principle integrate the position with 
Euler's method 

r i (t + At) = r i (t) + v i (t)At. (5.10) 

But if a particle is very close to the surface, and has a velocity towards it, it might fly into 
the wall during the timestep. It might even move through a boundary voxel and land into the 
inner wall. We don't want to perform collisions with an inner wall voxel, because it does not 
have well defined normal and tangent vectors. Instead, we have to back trace and identify which 
boundary voxel the particle hits first. It is done by a recursive bisection method. 

Assume that we have moved our particle from an empty voxel into an inner wall voxel using the 
whole timestep At, see figure 5.3. We then move it back half a timestep to see whether or not it 
now is in a 1) empty voxel, 2) boundary voxel or 3) wall voxel. If it still is in an inner wall voxel, 
we move back another At /A (and so on, halting the distance each step) until we either find the 
boundary voxel or hit an empty voxel again. If we while moving back hit an empty voxel, we 
accept the move. We haven't used the full timestep At, but a smaller amount of time, r. This 
means that we can continue the timestep that now is a bit shorter, At — r. If the particle at any 
point in the algorithm hits the boundary voxel, we're almost done. We then have to compute 
the exact time 5t the particle will use before it collides with the surface of the boundary voxel 
(the boundary voxel has six faces of which the particle can hit). After the particle is moved to 
that point, we choose a new, random velocity based on the normal and tanget vectors of that 
boundary voxel. 
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Figure 5.3: The collision detection algorithm. A particle first moves the full timestep At (step 
1) from an empty voxel to an inner wall voxel. It is then moved back half a timestep At/2 
(step 2) to see whether or not it is still in the wall. In step 3 we have accepted step 2 and move 
the particle another At/4. Now it is in a boundary voxel. We then move back (step 4) to the 
previous position and compute the amount of time St it is until collision with the surface of that 
boundary voxel. In step 5 we choose a new velocity based on the normal and tangent vectors 
that boundary voxel has. Then we continue the timestep. This whole process may happen 
several times during one timestep. 

The code performing this recursive collision process is found in listing 5.6. This function is called 
for every particle in the system. 



void m o ve _ pa rt i c I e ( int &index, double dt , Random &rnd) { 

double tau = dt ; // Time left in the timestep 

Vector3 &position = p o s i t i o n s . a t ( i n d ex ) ; 

Vector3 ^velocity = v e I o c i t i e s . a t ( i n d ex ) ; 

unsigned char &voxel_value = g r i d . get _ voxe I ( p o s i t i o n ) ; 

do_move ( p os i t i o n , velocity , tau); 

voxel_value = g r i d . get _ voxe I ( p o s i t i o n ) ; 

// We now have three possible outcomes 

i f ( voxe I _ va I u e >= voxe I _ ty pe _ wa 1 1 ) { 

// We hit a wall. First, move back to find boundary 
while ( voxe I _ va I u e != voxel _ type _ bou nda ry ) { 
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i f ( voxe I _ va I u e = voxe I _ ty pe _ wa 1 1 ) { 

do_move ( pos i t i o n , velocity , — tau/2); // Move back half the 

timestep 
tau /= 2; 

voxel_value = g r i d . get _ voxe I ( p o s i t i o n ) ; 
} else { 

// We have now used tau of the total timestep dt 
d t — = tau; 

// We hit an empty voxel, continue timestep if 
// there is anything left in timestep 
if (dt > le-5 &fe depth < 10) { 

move_particle(index ,dt , rnd) ; 

return ; 

} 

} 

} 

// This is a boundary voxel 

unsigned char c o 1 1 i s i o n _ voxe I _ i n d ex = voxel_index; 

while ( voxe I _ va I u e = voxel _ type _ bou nda ry ) { 
co 1 1 i s i o n _ voxe I _ i n d ex = voxel_index; 
do_move ( pos i t i o n , velocity , —tau); // Move back 
// Compute time until collision with voxel boundary surface 
tau = grid. get_time_until_collision(position, velocity, tau, 

collision_voxel_index) ; 
do_move ( pos i t i o n , velocity , tau); 
voxel_value = g r i d . get _ voxe I ( p o s i t i o n ) ; 

} 

d t — = tau; 

Vector3 fenormal = g r i d . get _ norm a I ( co 1 1 i s i o n _ voxe I _ i n d ex ) ; 
Vector3 &tangentl = g r i d . get _ t a ngen 1 1 ( co 1 1 i s i o n _ voxe I _ i n d ex ) ; 
Vector3 &tangent2 = g r i d . get _ t a ngen t2 ( co 1 1 i s i o n _ voxe I _ i n d ex ) ; 
surface_collider.collide(rnd, velocity, normal, tangentl, tangent2) 

} else dt = 0; // Didn't hit any surface during the timestep 

if (dt > le-5) { 

move_particle(index ,dt , rnd) ; 

} 

} 



Listing 5.6: The collision detection algorithm. 



5.3.3 Update collision cells 

Now that all the particles have new positions, we need to update the collision cells so that they 
are aware of the changes. We loop through all particles and see if they have moved into a new 
collision cell. It is easily understood by a code example, see listing 5.7. 



void u p d a t e _ co 1 1 i s i o n _ c e 1 1 s ( ) { 

for ( int index=0; i n dex <n u m _ pa rt i cl es ; i n d ex++) { 

Cell &o I d cell = c e 1 1 _ c u r r e n 1 1 y _ c o n t a i n i n g _ p a rt i c I e ( n ) ; 

Cell &new_cell = ce 1 1 _ t h a t _ s h o u I d _ co n t a i n _ pa rt i c I e ( n ) ; 
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i f ( o I d _ ce 1 1 . i n d ex != new_ ce 1 1 . i n d ex ) { 

old cell . remove_particle(index) ; 

new_cell . add_particle ( index) ; 

} 

} 

} 



Listing 5.7: Updating the collision cell particle lists. 



5.3.4 Perform particle collision 

The final part of the timestep is to perform collisions within each collision cell. We assume that 
each cell knows how many collision partner candidates it should try to collide. Each candidate 
pair is chosen randomly (a particle cannot, of course, collide with itself) and we calculate the 
relative velocity. This is compared to the maximum relative velocity in that cell (as described 
in section 4.3) to decide whether or not the pair should collide. If we do get a collision, new 
random velocities are chosen (conserving both energy and momentum). Again, we illustrate the 
implementation with a code example, see listing 5.8. 



void c o I I i d e ( Random &rnd) { 

// Loop over the candidate collision pairs 

for(int collision=0; c o 1 1 i s i o n <c o 1 1 i s i o n _ p a i r s ; collision++ ) { 
// Pick two particles at random 

int index_0 = ( int ) ( r nd . next _ dou ble ( ) * n u m _ pa rt i c I es ) ; 
int index_l = ( ( int ) ( i ndex_ 0+1+rnd . next _ do u b le ( ) * ( n u m _ pa rt i c I es — 1) ) 
) % num_particles; 

// These indices are local indices in this cell. Find the global 
indices . 

index_0 = g I o b a I _ p a rt i c I e _ i n d i c es [ i ndex_ 0 ] ; 
index_l = g I o b a I _ p a rt i c I e _ i n d i c es [ i ndex_ 1 ] ; 

// Calculate pair's relative speed 

Vector3 &v0 = v e I o c i t i e s . at ( index_0 ) ; 
Vector3 &vl = v e I o c i t i e s . at ( index_ 1 ) ; 
double v_rel = (vO — vl) . length () ; 

// Update if new maximum relative velocity 
if( v_rel > vr_max ) { 
vr_max = v_ rel ; 

} 

// Accept or reject candidate pair according to relative speed 
if( v_rel > rn d . n ext _ dou bl e ( ) * vr_ max ) { 
collide_particles(vO, vl, v_ rel , rnd); 

} 

} 

} 

void c o 1 1 i d e _ p a r t i c I e s ( Vector3 &v0 , Vector3 &vl , Random &rnd) { 
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Vector3 v_center_of_mass = 0.5*(v0 + vl ) ; 
double v_rel = (vl — v2 ) . length () ; 

double cos_theta = 1.0 — 2 . 0* rnd . next _ dou b le ( ) ; 
double sin_theta = sqrt(1.0 — cos_ t h eta * cos_ t heta ) ; 
double phi = 2*M_PI* rnd . next _ dou bl e () ; 

Vector3 relative_velocity (1 ,1 , 1 ) * v _ re I ; 

r e I a t i v e _ v e I o c i t y . x *= cos_theta ; 

r e I a t i v e _ v e I o c i t y . y *= s i n _ t h et a * cos ( p h i ) ; 

r e I a t i v e _ v e 1 o c i t y . z *= s i n _ t h et a * s i n ( p h i ) ; 

vO = v_center_of_mass + 0 . 5 * r e I a t i v e _ v e I o c i t y ; 

vl = v_center_of_mass — 0 . 5 * r e I a t i v e _ v e I o c i t y ; 

} 



Listing 5.8: Example code showing how to perform collisions. 



5.3.5 Final comments 

We have now discussed the implementation and explained the details about every step in the 
algorithm. The sampling of statistics is trivial and is done as explained in section 4.4. After the 
simulation is completed, the state containing all positions and velocities of the particles is saved 
to disk so we can continue the simulation at a later time, or, as we will discuss in chapter 11, 
visualize the particles and their trajectories. Now we just need to explain how the parallelization 
is implemented. 



5.4 Parallelization 

The parallelization complicates parts of the implementation steps described in section 5.3. For 
example, the file containing the geometry information is split into several files, one per processor. 
Each processor will then only know about a subset of the whole system geometry. To simplify 
this reading, we will only explain the basic philosophy of how the code is parallelized. 

Each collision cell is completely independent of the other cells. We can then divide the spatial 
domain into sub domains, each fully controlled by one processor. Each processor is responsible 
for executing the timestep for every particle in the corresponding volume. The processors will 
usually contain many collision cells as illustrated in figure 5.4. We will use the terms processor, 
node and CPU interchangeably. Given that the processors have knowledge about the particles 
living in its volume, the only thing we have to take care of is when particles move from one 
processor to another. We have used (MPI) for the communication between processors, and 
assume that the reader is familiar with how MPI works. 
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Figure 5.4: Illustration of how the spatial domain can be divided into four sub domains, each 
controlled by a processor. Each processor contains many particles that are placed in several 
collision cells (marked grey). 



5.4.1 Topological structure 



The processors are divided into a three dimensional grid with (P x , P y ,P z ) being the number of 
CPU's in each dimension, yielding a total of P = P x P y P z processors. We can then use the grid 
coordinates (p x ,Py,p z ) to uniquely label the processors as shown in figure 5.5. When starting 
a program with MPI, each process is provided a unique identification number p in the range 
[0, P — 1] for P processors. This can be mapped to the 3-dimensional grid coordinates through 



Px(p) 



P 
P P 

P 



Pv(p) = -w modP v 
Pz (p) = P mod P z 



(5.11) 



78 Implementation 



Chapter 5 




Figure 5.5: Processor labeling in a 3-dimensional grid. Each processor is uniquely identified 
through its coordinate (p x ,Py,Pz)- 



whereas the inverted mapping is 

p(j)x,Py,Pz) = PxPzPy + PyPz + Pz- (5.12) 

With the processor id p given, it is easy to determine which sub volume this processor should 
control. If the system is of size Li in the i'th dimension, we can find the node length Lf ode = 
li = Li/ Pi. A processor with coordinates (p x , PyiPz) will control all particles with coordinates 
in the range 

X G \p x lxi (Px + l)ix) 

y G Wvi(Py + l ) l v) 

ze[ Pz l z ,( Pz + l)l z ). (5.13) 

Since the collision cells are independent of each other, collisions will happen in parallel where 
each processor loops through all of its cells colliding the particles as described in section 5.3. 



5.4.2 Exchanging particles 

During a timestep, a particle can move from one processor to one of the 26 neighboring nodes 
(the middle node in a 3 x 3 x 3 grid has a 26 neighbors). After each timestep, all processors 
loop through their particles to find the ones having moved out of the processor's spatial domain. 
This process is illustrated in listing 5.9. 
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double mpi_move() { 

for(int n=0; n<n u m _ pa rt i c I es _ I oca I ; n++) { 

int node_id = topo I ogy — >i n d ex_ f ro m _ pa rt i cl e _ i n d ex ( n ) ; 
if(node_id != myid ) { 

// P article belongs to another node 

} 

} 

} 



Listing 5.9: Detecting which particles moved out of a processor's spatial domain. 



In principle, there are 26 potential receiving nodes, so each node needs to be able to send in- 
formation to all of them. The easiest way to implement this is to let each processor directly 
communicate with all of its neighbors. However, this approach requires a lot more communica- 
tion time than actually needed. 

If we instead send information about these particles through a 3-step process, we can reduce the 
number of neighboring nodes from 26 to 6. This idea is best illustrated in two dimensions, but is 
easily generalized to the three-dimensional case, see figure 5.6. An analysis of the parallelization 
is studied in subsection 6.2 where we compare how well the program performance scales with 
increasing number of processors. 
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(p x ,p y ) - processor coordinate 

Step 1: send particle from (1,1) to (2,1) 
Step 2: send particle from (2,1) to (2,2) 



(0,2) 



(1,2) 



(0,1) 



(1,1) 




(0,0) 



(1,0) 



(2,2) 



r(t) + dt 



Step 2 



Step 1 



(2,0) 



Figure 5.6: The middle node (1,1) has 8 neighbors it needs to communicate with. Each node only 
needs to communicate with its nearest neighbors (4 in two dimensions, 6 in three dimensions), 
because the nearest neighbors can work as intermediate information carriers. A particle that 
moves from processor (1,1) to (2,2) will in step 1 be sent to (2,1), then in step 2 be sent to (2,2). 
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Results 



Every time a scientist creates or implements a model, it is important to verify that the model is 
giving correct results. Or maybe we should rephrase; it is important to verify that the model is 
doing as intended. It is not always clear what correct results means. As George E.P. Box said 

Essentially all models are wrong, but some are useful. (George E.P. Box, 1987) 

A good model should be in agreement with already existing theories (at least those we believe 
are reliable). Take special relativity for example, in the limit of low velocities, it reproduces the 
results of Newtonian mechanics (e.g. the effects time dilation and length contraction vanishes). 
In this chapter we will first validate the model by comparing the outcome of the model to 
established theoretical results in section 6.1. In section 6.2 we discuss how well the performance 
of the implementation scales for an increasing number of processors. We then, in section 6.3, 
analyze the Knudsen correction for the permeability for flow in a cylinder for various Knudsen 
numbers and cylinder radii. We conclude the chapter by studying a more complicated geometry, 
randomly packed spheres, in section 6.4. In all simulations we have done, the mass m was 
chosen to be 6.63 x 10~ 26 kg and the diameter d is 3.62 x 10~ 10 m. These are values describing 
argon, meaning the gas in the system is an argon gas. The timestep is approximately 1 ps in 
all simulations. This is smaller than the mean collision time for gases with density less than 

4.0 x 10 27 m~ 3 (equation (3.56)). 

6.1 Code validation 

We validate the DSMC implementation by comparing numerical results to those of statistical 
mechanics and the kinetic theory. First we let the particles start in some non-equilibrium 
state where all the velocity components are ±vq to confirm that the collision model results 
in the Maxwell-Boltzmann velocity distribution. The velocity vq corresponds to some initial 
temperature To that defines the standard deviation in the Maxwell-Boltzmann distribution. 
Then we study the Poisuille flow (two infinite parallel plates) and compare the spatial velocity 
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distribution v(x) - x being the distance from the lower plate - to numerical solutions to the 
linearized Boltzmann equation. 



6.1.1 Maxwell-Boltzmann velocity distribution 

We showed in section 3.6 that the particles in a gas in equilibrium will have velocities according 
to the Maxwell-Boltzmann distribution. We have run a simulation with 10 5 simulated particles 
with initial velocities given for each particle n as 

%n = w o(l — 2(n mod 2)) 

y n = VI - 2((n + 1) mod 2)) (6.1) 

where v$ is some velocity chosen so that T ~ 300K. There are no walls the particles can exchange 
energy with, so the total energy is conserved. The reason we have chosen this initial velocity 
distribution is that the net momentum will be zero if N is an even number (each particle's 
velocity components is canceled out by the next particle). This is is of course not the Maxwell- 
Boltzmann distribution - we are not in an equilibrium state. We expect that after some time, 
the system will equilibrate and approach the Maxwell-Boltzmann distribution which is given as 
(equation (3.45)) 



P(v i )dv i = J —^ e *><BT dv u (6.2) 

where i S {x, y, z} indicates the velocity component. We ran the simulation for 10 5 timesteps, 
and as we see in figure 6.1, the collision algorithm reproduces this distribution perfectly. The 
temperature that was measured to be T = 298.5 K. 



6.1.2 Poiseuille velocity profile 

The Poiseuille flow through a long channel is a standard and fundamental problem that is 
widely studied in the gas dynamics literature. The system consists of two infinite parallel plates, 
displaced by a distance h in the y-direction. A pressure gradient is applied in the z-direction 
by a constant acceleration g corresponding to equation 4.41. The channel is periodic in the x 
and z— direction, emulating a large system with negligible boundary effects. The gas particles 
collide with the plates through the thermal wall model, so the reflected particles will have zero 
tangential velocity on average. In the continuum limit, Kn — > 0, we expect the velocity profile 
to approach the parabolic solution [4] 

v z (y) = VP^-y(y-h). (6.3) 

However, in the transition flow regime, 0.1 < Kn < 10, we expect a non zero slip velocity [20]. 
The Knudsen number will affect the velocity distribution through the fact that a high Knudsen 
number means a larger mean free path, hence fewer inter-molecular collisions. A low collision 
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Figure 6.1: The system clearly reaches the Maxwell-Boltzmann distribution when the initial ve- 
locity distribution follows equation 6.1. Here the final temperature was measured to T =298.5 K 
which reproduces the expected distribution perfectly. 

rate will make the surface effects propagate slower through the system resulting in a higher 
slip velocity. Ohwada et al. [22] studied the Poiseuille flow for hard-sphere molecules with a 
wide range of Knudsen numbers by numerically solving the linearized Boltzmann equation. By 
using non-dimensionalized units, the result is not dependent on the pressure difference or the 
temperature. The only assumption is that the pressure gradient is so small that the Boltzmann 
equation can be linearized around the equilibrium state. To set up as system like this is a simple 
task with the DSMC algorithm, and is a good test case to validate both the surface interaction 
model and the inter-molecular collision model. 

We chose a cubic system with side length 1.0 [im with a solid plate at y =0 [im and y =1.0 (J.m. 
We ran six simulations with different densities, corresponding to different Knudsen numbers in 
the range [0.1, 11.0]. We want to vary the Knudsen number which was defined as 

Kl1 ~ L ~ V2Trd 2 p n L ^ 

We have substituted the mean free path from equation (3.55) in the last expression so that we 
can choose the Knudsen number through the density 

"- (Kn) - Tsk^E- (6 ' 5) 



In all runs, we chose the number of real atoms per particle N e ff = 10. Assuming ideal gas pressure 
Pq, an applied pressure difference corresponding to O.IPq was chosen to induce the flow. In figure 
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6.2 we see that the velocity profiles produced from the DSMC model is in excellent agreement 
with the solutions obtained by Ohwada et al. for all Knudsen numbers. We also see that as the 
Knudsen number increases, the slip velocity also increases, which is what we expect, see section 
2.9. 
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Figure 6.2: Non-dimensionalized velocity distribution for flow between infinite parallel thermal 
plates for different Knudsen numbers over two orders of magnitude. Each run is compared to the 
linearized Boltzmann solution of Ohwada et al. [22] which shows that the algorithm produces 
correctly the behavior in all regions, near the wall and internally in the channel. An applied 
pressure difference AP = Pa — Pb with Pa = I.IPb is applied. The initial temperature is 
T=300 K. We see how the slip velocity becomes larger for increasing Knudsen numbers, as 
expected from the discussion in section 2.9. 



6.2 Parallelization performance 



No matter how many processors we distribute the work load to, the total computation time 
needed to perform a simulation is obviously not reduced. There are still as many collisions as 
before as the physical problem is identical. The idea is to do many calculations at the same 
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time so that the real time we actually wait is reduced. Parallelizing comes with a cost, as we 
need more logic to allow the processors communicate with each other (exchanging particles and 
waiting for each other to finish each time step). As the number of processors increases, the total 
computation time per processor is reduced, but the time spent on communication often increases. 
We will measure what's called parallel scalability which indicates how efficient a program is when 
the number of processors is increased. There are two different kinds of scalability - weak and 
strong scaling 



• strong scaling is how the computation time changes with an increased number of processors 
on a fixed system size, whereas the 

• weak scaling is how the computation time changes with an increased number of processors 
on a fixed system size per processor. 



6.2.1 Strong scaling 

To see how the program efficiency scales with a fixed system size while increasing the number of 
processors is important if we want to study a specific system (a given system size and geometry, 
e.g. scanned from real data), but we would like to reduce the simulation run time. With a fixed 
system size, the total number of particles per CPU is reduced while increasing the total number 
of processors. We define the strong scaling efficiency rj s as 

/J (6.6) 



;s " Nt N ' 

where t\ is the total run time using one processor and tjy is the total run time using ./V processors. 
We see that r] s £ (0, 1) since Ntjy is the ideal scaling without any communication overhead. In 
this benchmark, we have run the simulation in an empty geometry - there are no surfaces to 
collide with. 

The physical system is a cube with side length L =1.0 |J.m with a density p n = 1.0 x 10 27 m~ 3 
which gives a total of one billion atoms. We choose that one DSMC particle represents 50 atoms, 
yielding a total of 20 million particles in the whole system. The benchmark was performed for 
10000 timesteps (At = 0.005) with 2^ processors from 1 CPU to 512 CPUs yielding a good 
estimate of how efficient the program scales for a relatively large number of processors. In table 
6.1 we have the scaling results with additional information like the number of intermolecular 
collisions per processor. This indicates the amount of computation each processor has to perform. 

The strong scaling efficiency is plotted together with the weak scaling against the number of 
processors in figure 6.3. We see that the program scales great with only 30% overhead for 512 
processors. The weird behavior we see at 128 processors, where the efficiency increases to 0.9, 
is probably due to random luck on the supercomputer where all nodes are physically close to 
each other. In order to get better statistics, this benchmark should be run several times. 
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Table 6.1: Benchmark results showing the strong scaling efficiency r] s for the DSMC program. 
We see that the program scales great with only 30% overhead for 512 processors. The weird 
behavior we see at 128 processors, where the efficiency increases to 0.9, is probably due to 
random luck on the supercomputer where all nodes are physically close to each other. In order 
to get better statistics, this benchmark should be run several times. 



6.2.2 Weak scaling 



Another important scaling problem appears when we want to maximize the simulated system 
size. If we keep a constant system size per CPU, and increase the number of processors, the 
limitation of how big we efficiently can create the system is controlled by the weak scaling. We 
then introduce the weak scaling efficiency r\ w defined as 



tN ' 



(6.7) 



where again t\ is the total run time using one processor and tjy is the run time using N processors. 
If the algorithm scales perfectly, the total run time would remain constant while increasing the 
number of processors (each CPU is ideally independent), but we expect some overhead. This 
implies that the range for r] w also is between zero and one. We have run the same geometry 
as for the strong scaling, but each processor controls a volume of 1 (J.m 3 so that the largest 
system is 512 (J.m 3 . Keeping the same density, but using 500 atoms per particle, gives a total 
of 2 million particles per processor. In table 6.2 and figure 6.3, we see the results for the weak 
scaling efficiency simulation. The weak scaling also shows promising results with a reasonable 
low overhead. 
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Table 6.2: Benchmark results showing the weak scaling efficiency rj w for the DSMC program. 
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Figure 6.3: The weak and strong scaling efficiency, rj w and function of the number of 

processors iVcpu- We see that at 512 processors, both efficiencies are reduced to 60-70% because 
of the MPI communication overhead. Despite the bad statistics, it seems like the efficiency has 
more or less converged to a satisfactory level. 
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6.3 Results for simple geometries 



In this section, we study flow in a simple geometry where the permeability is known from theory. 
The expression for the permeability is only valid for small Knudsen numbers (which we called 
the absolute permeability; the permeability for fluids in the continuum limit), so it is a perfect 
test case for the Knudsen correction factor f c in equation (2.19). 



6.3.1 Flow in a cylinder, varying Knudsen number 



We have induced flow in a cylinder with radius 0.45 |xm with an applied acceleration corre- 
sponding to a pressure difference AP = O.li-bi where Po is the ideal gas pressure at 300 K. 
While varying the density, we adjusted iV e g so that the total number of simulated particles 
was approximately one million. We expect an apparent permeability satisfying the Knudsen 
correction 



k a = koofc = koo[l + a(Kn)Kn] 



1 + 



4Kn 
1 + Kn 



(6.8) 



The analytical absolute permeability for a cylinder with radius r is given by[15] 



*<0O — 0 i 



(6.9) 



which gives the following prediction for the apparent permeability 



k a = [1 + a(Kn)Kn] 



1 + 



4Kn 
1 + Kn 



(6.10) 



After 200k timesteps with sampling, the permeability was calculated according to equation 
4.32. In figure 6.4 we have plotted the measured permeability as a function of Knudsen number 
with the Knudsen corrected analytical solution. The left figure has the Knudsen numbers on 
a logarithmic x-axis to emphasize the lower Knudsen numbers. We see that the measured 
permeability is slightly lower than predicted for the higher Knudsen numbers. This is probably 
an effect from the voxelization of the cylinder where the effective radius is slightly smaller than 
desired. By reducing the radius by a factor 0.99 in equation (6.10), the measured permeability 
perfectly overlaps the analytical solution. 
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Figure 6.4: Permeability as a function of Knudsen number for a cylinder with radius 0.45 [im and 
length 1 [Lin with an applied pressure difference AP = O.lPoj -Po being the ideal gas pressure. 
We control the Knudsen number by varying the density. The blue line is the Knudsen corrected 
analytical solution in equation (6.10). We see a slightly lower permeability than the analytical 
solution, which probably is an effect from the voxelization of the cylinder where the effective 
radius is slightly smaller than desired. By reducing the radius by a factor 0.99 in equation (6.10), 
the measured permeability perfectly overlaps the analytical solution. 



6.3.2 Flow in a cylinder, varying radius 



If we instead keep the Knudsen number constant (Kn = 1.0), we can vary the radius to verify 
equation (6.9). We have studied radii in the range 0.1 (a.m to 0.45 (J.m with the same pressure 
difference as in the previous simulation (AP = O.lPo)- m figure 6.5 we have plotted the mea- 
sured permeability as a function of cylinder radius. The straight line confirms the quadratic 
dependency in equation (6.9). 
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Radius [m] 



Figure 6.5: Logarithmic plot of the permeability for different cylinders with radii in the range 
0.1 (J.m to 0.45 [im with an applied pressure difference AP = O.IPoj Pq being the ideal gas 
pressure. The blue line is the Knudsen corrected analytical solution from [15]. The permeability 
scales with the radius as r 2 . 

6.4 Results for complicated geometries 



We have now seen that the Knudsen correction factor works well for systems with a well defined 
Knudsen number. It does however need one Knudsen number to be able to predict the per- 
meability, but for more complex geometries, there is rather a distribution of Knudsen numbers 
than a single number. It could work as a lower and an upper limit of the permeabilities for 
two different input Knudsen numbers, but they could possibly differ by an order of magnitude. 
Another approach could be to find the average distance (L) to the surface and use that distance 
to estimate the Knudsen number as 

Kn' = A. (6 . n) 

In this section we will study a system consisting of randomly packed spheres that does not have 
a well defined Knudsen number. The spheres may overlap, so their positions are completely 
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independent of each other. The distribution and expectation value of distances to sphere surfaces 
is derived in appendix A also providing a suggestion to the estimated Knudsen number Kn(r, 4>)* 
for packed spheres of radius r and porosity (f>. First we discuss the Carman-Kozeny equation 
which is an analytical expression for the permeability for packed spheres. Then we discuss how 
the simulation is run and discuss the result. 



6.4.1 The Carman-Kozeny equation 

In 1927, Kozeny proposed an equation predicting the (absolute) permeability of packed spheres 
of radius r forming a system with porosity <ft given as 

*~ = 9K(T^P (6 - 12) 

where K is the Kozeny constant which is experimentally measured to be around five for equally 
size spheres [5]. This result has been verified to predict permeabilities in many macro scale 
experiments since its discovery. However, at the nanometer scale, we expect deviations due 
to high Knudsen numbers. Using the Knudsen correction factor with the estimated Knudsen 
number / c (Kn(r, </>)*) (equations (2.19) and (A. 21)), we expect the permeability to be 

2 1 3 

k a {r) = / c [Kn(r, </>)*] ^j^^- (6.13) 



6.4.2 The simulation and results 



We ran the simulation for spheres with radii from 0.01 (J.m to 0.08 |xm while keeping approxi- 
mately constant porosity, <fi ~ 0.3. This was done by adding spheres until the desired porosity 
was obtained. For each radius, ten geometries were created with different seeds yielding better 
statistics. The permeability for each sphere radius is then averaged over the ten different ge- 
ometries. In figure 6.6, we have plotted the permeabilities measured in the simulations with the 
average value of the ten runs per sphere radius. We have also plotted the Knudsen corrected 
Carman-Kozeny permeability which gives a pretty good estimate for the permeability. 
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Figure 6.6: Permeability for different sphere radii in a system consisting of packed spheres. The 
red dots show the numerical results whereas the red line shows the average value for each sphere 
radius. The black line is the Knudsen corrected Carman-Kozeny expression for the permeability 
with the estimated Knudsen number (equation (A. 21)). We see that due to the large statistical 
spread in the sphere configurations, the spread in permeability is also quite large. While the 
Knudsen corrected Carman-Kozeny expression does give a good estimate of the permeability, it 
is clear when the spread in Knudsen numbers is large, it is not sufficient with one single average 
value. 
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Theory 



Molecular Dynamics is another numerical model that describes the behavior of liquids, gases 
and solids at the finest scale of any classical model. We can study the dynamics of single atoms 
and how they interact with each other forming molecules and larger objects like advanced pore 
networks. The idea is simple and has been used since the time of Sir Isaac Newton in the 
17th century when he formulated his laws of motion. With the knowledge of the relevant forces 
between the atoms, we can solve Newton's equations and calculate their dynamics. 

We open this chapter by a short introduction to the model in section 7.1 before we explain how 
forces are calculated with the Lennard- Jones potential in section 7.2. With computed forces, 
we can integrate the atoms with Newton's laws following an time integration scheme which we 
discuss in section 7.3. The last section is concerned with how we keep a constant temperature 
using a thermostat. Physical quantities are measured in the same way as we did in DSMC in 
section 4.4. 



7.1 The model 



The state of a Molecular Dynamics system is fully described by seven variables per atom; three 
positions, three velocities plus the atom type. The phase variables are evolved through the 
laws of motion. It is here common, as in DSMC (see section 4.1), to apply periodic boundary 
conditions. Applied periodic boundary conditions in all directions implies constant volume 
(unless, of course, we rescale the system size which can be done) . The atomic forces are calculated 
as the gradient of a chosen potential that can differ quite a lot depending on the requirements of 
the model. A noble gas like Argon can be modeled with a simple potential called the Lennard- 
Jones potential which will be discussed below. With this potential, one can calculate equilibrium 
thermodynamic properties that are in good agreement with experimental values [26]. System 
statistics are sampled as ensemble averages through ergodicity over large times. A typical 
Molecular Dynamics algorithm is illustrated in figure 7.1. 
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Save state and statistics 


Yes 



Stop 



Figure 7.1: Flow chart illustrating a typical Molecular Dynamics algorithm. 

7.2 Force calculation - potentials 



In general, the potential energy is given as a function of the configuration given by the atom 
positions 



U(r) 



^U 2 (rij)+ U r {Ti,rj,r k ) + 



i<j<k 



(7.1) 



where r is the phase space point, U n is a function of the positions of n atoms, is the relative 
distance between atom i and j, is the position of atom %. Advanced potentials such as ReaxFF 
can even have 5-atom contributions to the energy [24]. The reason for this is simple; when three 
atoms are close to each other, the electron configuration might be different than if there were 
only two atoms. These effects play a large role in forming molecules, such as water. Numerically, 
the calculation of forces is the most expensive part of the whole program, so for simple systems 
or educationally purposes it might be sufficient to use the Lennard- Jones potential. 



7.2.1 The Lennard- Jones potential 



We often see this potential referred to as the 6-12 potential, due to its functional form. It is a 
function that inherently carries the two main properties of atomic forces; the short-ranged Pauli 
repulsion and the long-ranged van der Waals force. The potential is only between pairs of atoms 



U LJ {n 



4e 



/ \ 12 






{.rij) 







(7.2) 
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where is the relative distance between atom i and j, e and a are coupling constants giving 
the depth of the potential well and the distance where the potential is zero. The force is given 
as the gradient of this potential yielding 



FLj(rij) = -VU LJ {r i:i ) = -24e 



a 



12 



,13 



u 



>j< 



(7.3) 



where Ujj is the unit vector pointing from atom i to atom j. In this thesis, we will only study 
the LJ-potential, so we simplify the notation by choosing F^j = F and Ulj = U. The form 
of the potential is shown in figure 7.2. With this potential, we can in principle calculate the 



5 

V/E 
4 




Figure 7.2: The Lennard-Jones potential as a function of relative distance rj,- between two atoms. 
(Image from http://en.wikipedia.Org/wiki/File:12-6-Lennard-Jones-Potential.svg, accessed 18 
March, 2014.) 



forces between atoms that are 10 meters apart. We already know the answer though, it should 
be very, very close to zero. Atoms being far away from each other do not interact. This distance 
is actually not very large. If we insert r%j = 3. Oct into equation (7.3), and choose units so that 
a = e = 1.0, we get F(r = 3.0) ~ 0.01 (compared to the maximum attraction near the potential 
well F(r = 1.24) ~ 2.4, more than 100 times larger). The force decreases slowly towards zero 
for larger displacements. We can exploit this property and only compute forces between atoms 
displaced by a distance smaller than some cut-off radius. 
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7.2.2 Cut-off radius 



In principle, we have to sum over all pairs in the system which for N atoms is 0(N 2 ). This 
calculation can be reduced to O(N) by realizing that the gradient of the potential (hence the 
force) is nearly zero at r ~ 2.5c We now introduce the cut-off radius r cut , which we choose to 
be r cut = 2.5a. The force between a pair of atoms F(rij) is then written as 



F(r^) 



-24e 



ij 



U; 



if rij < r cut 



(7.4) 



if ru > r cut . 



This means that we don't have to compute the forces between atoms that are more than r cut 
apart. So locally, within this radius r cut , the number of pairs we need to sum over is proportional 
to p^, but if we double the system size, most of the extra atoms do not feel the forces from the 
atoms already in the system since they are separated by more than r cu t- So the calculation of 
forces is O(N). 

We need to be careful here, because the potential energy is calculated with the same rules at the 
forces. So an atom just outside r cu t will have zero potential energy whereas an atom just inside 
will have a non-zero energy. This can lead to erroneous fluctuations in the energy. We will fix 
that by shifting the potential energy so that U(r cu t) is zero. The potential energy is then 

U(r tJ ) = { U n LJ ^ ~ U <r«*) f rcut (7.5) 
[0 if rij > rcut, 



where Uu{rij) is the same as in equation 7.2. 



7.3 Time integration 



The idea of doing a Molecular Dynamics simulation is to start from some initial state (ro,vo) 
and compute the time evolution of atoms following Newton's equations with the defined forces. 
We will use this time evolution to sample statistics from states in the phase space, assuming that 
the time evolution will guide the system through the phase space with probabilities according 
to the statistical ensemble. This was the assumption of ergodicity. In a standard Molecular 
Dynamics simulation, we want to sample states from the microcanonical ensemble (NVE) where 
the energy, volume and number of atoms are conserved. Our choice of time integrator should 
therefore conserve energy as good as possible. A good choice is the Velocity Verlet algorithm, 
which is known to be both fast and conserve energy quite welljll]. 

The Velocity Verlet algorithm requires the knowledge of the positions, velocities and instanta- 
neous accelerations of the atoms. The latter is found through the force on an atom aj = Fj/mj, 
so it requires the storage of 9N variables in a system with N atoms. We have derived the 
Velocity Verlet algorithm using the Liouville formulation of time evolution in appendix B. The 
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integration steps are 

, . . . . F(t) At , . 

v(t + At/2)=v(t) + -^ T (7.6) 

r(t + At) = r(t) + v(t + At/2)At (7.7) 

v(t + At) = v(t + At/2) + F(t + At) ^, (7.8) 

m 2 

which is done each timestep for every atom in the system. We call the first step a half kick, 
because it integrates the velocity with forward Euler with half a timestep. The position is then 
also integrated with forward Euler with the full timestep At before we do another half kick. 



7.4 Thermostat 



When N atoms move around, their temperature are defined as in DSMC, through the equipar- 
tition theorem 

T = mr B - < 7 - 9 > 

where E^ is the instantaneous kinetic energy. But what if we would like another temperature? 
If we want to increase the temperature, we see that increasing the kinetic energy would do the 
job. This is in fact a decent way to control the temperature. We say that when we push the 
system towards a given temperature, we apply a thermostat. A much used thermostat is the 
Berendsen thermostat. Assume that the system has a temperature T, and we would like to push 
the system towards a new temperature To. We can think of this as a heat bath in contact with 
the system. The Berendsen thermostat is defined in a way so that 

for some time constant r which essentially determines how fast the system reaches the desired 
temperature. A simple algorithm obeying the above convergence rate is by multiplying the 
velocities of all atoms by a factor 

/ A+ /TL \ 

(7.11) 

where At is the timestep. We see that if the temperature T is higher than the temperature of 
the heat bath To, 7 becomes smaller than one, reducing the velocities of all the atoms. If the 
temperature is lower than desired, 7 is larger than one. While this is a efficient way to gain a 
desired temperature, it does affect the system in a non-physical way. 

Without a thermostat, the states of the system live in the microcanonical ensemble where the 
number of atoms, volume and energy is constant. By applying a thermostat, the energy is clearly 
not conserved and it is tempting to say that we instead have a constant temperature which is 
the canonical ensemble. The Berendsen thermostat might produce states not only from this 
ensemble, so it should be used with care. 
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Implementation 



In this chapter we will go through the implementation of the MD code in the same way we 
did with DSMC in chapter 5. We discuss the stages from the beginning of the simulation and 
introduce implementation techniques when they appear in the process. We here assume that 
we want to simulate ./V atoms in a box of physical size L x x L y x L z using the Lennard-Jones 
potential. Our choice of units is described in appendix E. 

We start with section 8.1 by briefly explaining how the code is parallelized, since this affects 
many of the implementation choices we have made. The steps in the actual program can then 
be explained. We start system initialization in section 8.2 where we describe the FCC lattice 
and how we choose the initial velocities of the atoms. The timestep consists of several stages 
that are explained in section 8.3. In section 8.4 we introduce a model of a solid allowing us to 
study flow in complicated geometries. 



8.1 Parallelization 



The parallelization technique is very similar to what we did in DSMC in section 5.4. The spatial 
domain is divided into P x x P y x P z smaller domains (Pj being the number of processors in the 
ith dimension). Each such domain is then of size l x x l y x l z where U = Li/ Pi. The processor 
arrangement and labeling is done as shown in figure 8.1, where a processor with coordinates 
iPxiPyiPz) controls atoms with coordinates in the range 

x e \p x l x , (px + i)ix) 

ye \Pyly,(Py + l)ly) 

z€\p g lz,(p g + l)l g ). (8.1) 

We represent the coordinates of an atom on processor (p x ,p y ,p z ) in the local coordinate system 
where the CPU's origin Porigin defines the coordinate system. An atom with global coordinates 
r will then have local coordinates r' 

r' = r - porigin- (8.2) 
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Figure 8.1: Processor labeling in a 3-dimensional grid. Each processor is uniquely identified 
through its coordinate (p x ,Py,Pz)- 



We can then easily detect if an atom has moved outside its processor's domain by checking if 
any of the local coordinate components are outside the range [0, Zj) for dimension i. If an atom 
has moved to another processor, it has moved to one of the 26 neighboring processors. We will 
use the same 3-step process as described in subsection 5.4.2 which is best illustrated with the 
2-dimensional example in figure 8.2. We only need to know about the 6 nearest neighbors on 
each processor. We notice a neat detail here, periodic boundary conditions are automatically 
taken care of. If we again look at figure 8.1, we have 4 processors in the x-direction. The 
processor to the right of (3,0,1) is (0,0,1) if we use periodic boundary conditions. But since 
we use the local coordinates on each processor, when an atom moves to another processor, its 
coordinates must be shifted so its has correct local coordinates on the new processor. Each 
processor has one shift vector per neighbor, containing the transformation it needs to apply 
on the atom's coordinates before it moved. If the atom moved through the system boundary 
(periodic boundary conditions), the shift vector must of course reflect this. 



8.2 System initialization 

We are now ready to initialize the MD system. We assume that we will simulate an system with 
volume 

V = L x L y L z , (8.3) 
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(p x ,p y ) - processor coordinate 
Step 1 : send atom from (1 ,1 ) to (2,1 ) 

Step 2: send atom from (2,1 ) to (2,2) 



(0,2) 


(1,2) 


(2,2) 








r(t) + dt 








k 

Step 2 


(0,1) 


(1,1) ^/ 


(2,1) 






r(t) 


Step 1 




(0,0) 


(1,0) 


(2,0) 



Figure 8.2: The middle node (1,1) has 8 neighbors it needs to communicate with. Each node only 
needs to communicate with its nearest neighbors (4 in two dimensions, 6 in three dimensions), 
because the nearest neighbors can work as intermediate information carriers. An atom moving 
from processor (1,1) to (2,2) will in step 1 be sent to (2,1), then in step 2 be sent to (2,2). 



where Lj is the system size in the ith dimension. The system contains ./V argon atoms and is 
performed on 

P = P X PyP Z (8.4) 

processors, Pi being the number of processors in the ith dimension. Each processor controls a 
volume 

Vp = l x lylzi (8-5) 
where li = Li/ Pi is the node length. The origin of processor (p x ,Py,Pz) is given as 



Porigin {Px > Py , Pz ) — Px^x^ Py^yj Pz^zk- 



(8.6) 
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The initialization process consists of creating the atoms, place them at some position and give 
them velocities according to the Maxwell-Boltzmann distribution (equation (3.46)). We will 
initially place the atoms on an face-centered cubic lattice (FCC lattice). 



8.2.1 FCC lattice 



The FCC lattice is a lattice structure where we place atoms on all 8 corners of a cube in addition 
to all the faces, hence the name. In figure 8.3 we see how the atoms are placed on the FCC 
lattice. An FCC lattice can be constructed from a unit cell consisting of four atoms with local 




a 



Figure 8.3: The face-centered cubic lattice. There are atoms on all eight corners of a cube, 

in addition to the six faces. The length a is called the lattice constant. (Image from http: 
//en.wikipedia.org/wiki/File:Lattice_face_centered_cubic.svg, accessed 19 March, 2014.) 

positions (relative to the origin of the unit cell) 

ri=0i + 0j + 0k (8.7) 

r 2 = 2 1 + ^ + 0k ( 8 - 8 ) 
r 3 = 0i + -j + -k (8.9) 
r 4 = ^i + 0j + |k, (8.10) 
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where a is the lattice constant, the length of the unit cell. By adding several such unit cells, 
each with origin determined by the unit cell coordinate (m, n, I) 

r U nit cell = rnai + naj + lak, (8.11) 

we can create a system of arbitrary size. Each atom is assigned a random velocity according to 
the Maxwell-Boltzmann distribution. The total number of atoms in the system is based on how 
many unit cells we create. If we label the number of unit cells per processor in the ith dimension 
N c , the total number of unit cells is 

N c = N^N^N^P, (8.12) 

which gives the total number of atoms (each unit cell contains 4 atoms) 

N = AN C . (8.13) 

The system size is then 

L x = aN^P x (8.14) 

L y = aN^P y (8.15) 

L z = aN^P z , (8.16) 

yielding a total volume V = a 3 N c . In listing 8.1, we have shown how this is implemented in 
C++. 



void c r e a t e _ f cc _ I a 1 1 i c e ( ) { 

double xCell[4] = {0, 0.5, 0.5, 0} 

double y Cel I [4] = {0, 0.5, 0, 0.5} 

double zCell[4] = {0, 0, 0.5, 0.5} 

int index = 0; 

double ve I oc i ty _ s t a n d a rd _ d e v i a t i o n = sq rt ( bol tzm a n n _ co nsta n t * 

temperature/mass) ; 
for(int x = 0; x < u n it _ eel ls_ per _ cpu _x ; x++) { 

for(int y = 0; y < u n i t _ eel ls_ per _ c pu _y ; y++) { 

for(int z = 0; z < u n i t _ce 1 1 s _ pe r _ cpu _ z ; z++) { 
// Loop over the 4 atoms in this unit cell 
for (int k = 0; k < 4; k++) { 

po s i t i o n s . a t ( i n d ex ) . x = (x+xCell[kj) * FCCa ; 
po s i t i o n s . a t ( i n d ex ) . y = ( y+y Cell [ k ] ) * FCC a ; 
po s i t i o n s . a t ( i n d ex ) . z = (z+zCell[kj) * FCCa ; 
v e I o c i t i e s . at ( i n d ex ) . x = rnd . nextGa uss () * 

velocity_standard_deviation ; 
v e I o c i t i e s . at ( i n d ex ) . y = rn d . n ext Ga u ss ( ) * 

velocity_standard_deviation ; 
v e I o c i t i e s . at ( i n d ex ) . z = rn d . n ext Ga u ss ( ) * 

velocity_standard_deviation ; 
index++; // Increase atom counter 

} 

} 



Listing 8.1: Code example showing how to create an FCC lattice on one of the processors. 
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This is the whole initialization process. The atoms are where they should be, they have a 
statistically correct velocity and we are ready to integrate forward in time. 



8.3 Timestep 

We are using the Velocity Verlet algorithm as we discussed in section 7.3 and derived in appendix 
B. The timestep calculation starts by what we called a half kick, where the velocities are inte- 
grated half a timestep with forward Euler. But to be able to do so, we need to have calculated 
the forces. This part is the most technical in the whole algorithm, because of two things. We 
need to find a way to implement the cut-off radius so that we don't loop over too many atom 
pairs, and each processor needs to have information about the atoms on its neighboring atoms 
(there are forces between atoms living on different CPUs). We will first discuss how duplicates 
of atoms, so-called ghost atoms, are copied between nodes before we explain how we use cell 
lists to compute forces between nearby atoms only. 

8.3.1 Ghost atoms 

Assume now that we are going to calculate the forces between the atoms on some node with 
coordinates (p x ,Py,Pz)- The atoms near the boundary will of course feel the presence of the 
atoms on the next processor (if they are sufficiently near). The node will then, from each of its 
26 neighbors, get a copy of the atoms that are less than r cut from the node boundary. In figure 
8.4, this is illustrated in a two-dimensional example. 

Since we don't calculate forces between atoms displaced by more than the cut-off radius, we 
actually compute all the forces we want. This process is done for every processor, every timestep, 
so that all nodes have all the information they need to compute the forces (and potential energy). 

8.3.2 Cell lists 

We will now sort the atoms on a node into cells, similar to the collision cells in DSMC (see 
section 5.3). Each processor has atoms with coordinates in the range [0,k), U being the node 
length in the ith dimension, and ghost atoms with coordinates (— r cut ,0) U [Zj,Zj + r cut ). We 
choose the cells to be about the same size as the cut-off radius, which gives the number of such 
cells Nf (force cells) in the ith dimension 

N f = \li/r cut ] + 2, (8.17) 

where we have added two extra ghost cells containing the ghost atoms, \a] is the ceiling function 
of a, that is, the smallest integer number not larger than a. The reason we use the ceiling function 
is so the length of the node exactly matches the combined length of the cells (minus the two 
extra ghost cells). The size of the cells is then at least r cut so that the atoms within a cell will 
not feel the presence of atoms from any other cells than the 26 nearest neighbors. In figure 8.5 
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we have illustrated the idea in a two-dimensional system. We see the total spatial domain of 
a processor with the copied ghost atoms (in the gray area), divided into cells of size r cu t- The 
yellow cell will compute forces between all of its atoms and the atoms in the green cells. This is 
done for every inner cell (the ghost atoms are computed on another processor). 
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Figure 8.4: The middle processor with coordinates (1,1) in a two-dimensional system receives 
a copy of all the atoms less than r cut from the boundary (ghost atoms, marked red) from the 
neighboring eight processors. The gray area is the region with the ghost atoms. The black dots 
are the atoms on node (1,1), gray dots are atoms that do not contribute to the forces of the 
black atoms. 



108 Implementation 



Chapter 8 



8.3.3 Calculation of forces 



The atoms are now sorted into cells. We then loop over all cells, and for each cell, loop over the 
26 neighboring cells. With each cell pair, loop over all pairs of atoms and calculate forces between 
them if their relative distance is smaller than r cu t- In the inner scope of the loop, we now have 
two atoms i and j with positions and r^. Given their relative distance rjj = (Ax, Ay, Az), 
we are ready to compute the force between them. The Lennard- Jones force is given as (for 
fij ^ ''cut) 



F(r«) 



-24e 



a 



12 

33 




u 



(8.18) 



and if we choose the so-called MD units [o = 1.0, e = 1.0, see appendix E), the x-component of 
the force is given as 



-24 




1 



? • 

v l 3 , 



Ax 



1.19) 



where Ax is the x— component of r^-. Notice that we have factored out 1 /rh. This is arithmetical 
convenient for the implementation, because, as we see in equation (8.19), we need r^ 6 and 2 , 
which both are easily calculated from r^ 2 as powers. 

First, we skip atoms displaced by a distance larger than r cut and compute the square of the 
relative distance 



r 2 
i j 



(8.20) 



Its inverse is gives us the higher powers 



3.21) 



r ■ ■ 

ij 



12 



,-2 



(8.22) 
(8.23) 



In listing 8.2 we have shown the code that calculates the Lennard-Jones force. We see that 
Newton's third law is implemented by skipping a pair of atoms if atom_index_0<atom_index_l, 
avoiding calculating that pair twice. 



void a p p I y _ I e n n a rd _ j on es ( ) { 

// Loop through all local cells (not including ghosts) 
for ( int ce 1 1 _ i n d ex _ x = 1 ; ce 1 1 _ i n dex_ x<=n u m _ eel Is . x ; cell 
for (int ce 1 1 _ i n d ex _ y = 1 ; ce 1 1 _ i n dex_ y <=n u m _ eel Is . y ; cell 
for (int ce 1 1 _ i n d ex_ z = 1 ; eel I _ i n d ex_ z <=n u m _ eel Is . z ; cell 

// Index of this cell 

cell_index = ce 1 1 _ i n d ex _ f ro m _ i j k ( eel I _ i n d ex _x , cell_index 
cell index z) ; 



i n d ex _ x+H 
i n d ex _ y +H 
index z+H 
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} 



// Loop through all neighbors (including ghosts) of this cell. 
for ( int i =ce 1 1 _ i n d ex _x — 1 ; i <=ce 1 1 _ i n d ex_x + 1 ; i++) { 
for (int j=ce 1 1 _ i n d ex _ y — 1 ; j <=ce 1 1 _ i n d ex_ y +1 ; j++) { 
for (int k=ce 1 1 _ i n d ex_ z — 1 ; k<=ce 1 1 _ i n d ex _ z +1 ; k++) { 

// Index of neighbor cell 

cell_index_2 = cell_index_from_ijk(i , j ,k) ; 

// Head is pointing to the first atom index in a cell 
int atom_index_0 = head _atoms [ c e 1 1 _ i n d ex ] ; 

while ( atom_index_0 != EMPTY) { / The last atom in a cell points at 

the EMPTY value 
int atom_index_l = head _atoms [ ce 1 1 _ i n d ex _ 2 ] ; 
while (atom_index_l != EMPTY) { 

i f ( atom _index_0 < atom _ i ndex_ 1 ) { // Newton's 3rd law 

double dx = p o s i t i o n s . a t ( atom_index_0 ) . x— p o s i t i o n s . a t ( 

atom_index_l ) .x; 
double dy = p o s i t i o n s . a t ( atom_index_0 ) . y— p o s i t i o n s . a t ( 

atom_index_l ) .y; 
double dz = p o s i t i o n s . a t ( atom_index_0 ) . z— p o s i t i o n s . a t ( 
atom index 1 ) . z ; 



double dr2 = dx*dx 
if 



dy*dy + dz*dz : 



( d r2 <r_ cu t _sq u a red ) { 
double dr2_inverse = 1.0/dr2; 

double dr6_inverse = d r2 _ i n ve rse * d r 2 _ i n ve rse * d r2 _ i i 
double drl2_inverse = d r6 _ i n ve rse * d r6 _ i n ve rse ; 
double force = 24* (2* d r 1 2 _ i n ve rse — d r6 _ i n ve rse ) * d r 2 
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// Next atom 



} 

atom_index_0 = I i n ked _ I i st _ of _ a to m s [ atom_index_0 
}}}}}}} 



Next atom 



Listing 8.2: Implementation of the Lennard- Jones force. The code loops over all cells and their 
neighbors computing the forces between all the atoms in neighboring cells. 



8.3.4 The timestep 

Now that we can calculate the forces, we can summarize the whole timestep. Assuming that the 
forces are already calculated, we start by the half kick, integrating the velocity half a timestep. 
Then we move the atoms by integrating the positions with forward Euler a whole timestep At. 



110 Implementation 



Chapter 8 



Since some atoms now may have changed which processor they live on, we have to move atoms 
with MPI. Once this is done, every processor is ready to calculate the forces, but in order to do 
so, we need the ghost atoms on every node. It is time to reset the accelerations from the last 
timestep, before we apply the constant force we added to induce flow in the system. After this 
is done, we can calculate the forces and do the final half kick, which is the last stage of the 
timestep. In listing 8.3 we show how the step function is implemented, simply calling functions 
performing all the stages we have now explained. 

void ste p ( ) { 

ha If _ kick () ; 
move ( ) ; 
mpi_move ( ) ; 
mpi_copy () ; 

reset_accelerations() ; 
apply_constant_force() ; 
apply_lennard_jones() ; 
ha If _ kick () ; 

} 

Listing 8.3: Code showing the stages during a timestep. 



8.4 Complex geometries 

As we did with DSMC, we want to study flow in arbitrary geometries. To be able to do this, 
we need to create a model that satisfies some properties we already have in DSMC. The flowing 
fluid needs to 

• be confined in a subset of the total volume, 

• have realistic interactions with the solid wall, and 

• have energy drained by the solid. 

The first requirement is not completely strict, but most of the flowing fluid should be in the 
free volume. The reason why we need to drain the energy is that in order to induce fluid flow, 
we apply a constant force that increases the total energy of the system. We want to reach an 
equilibrium where the average rate of drained energy exactly matches the energy added by the 
applied force. We assume that the geometry is described as a boolean function G : R 3 — > {1,0} 
that fully determines whether a point in space is part of the solid or the free volume. A convenient 
representation is the voxelization described in section 5.1.1. 

8.4.1 A naive approach 

To illustrate the basic idea, we will discuss the simplest model satisfying only the first two 
requirements. Given a molecular dynamics state, we can loop through the positions r-j of each 
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atom i, and mark the atoms as frozen if G(rj) = 1 (which means that this point is part of 
the solid). Atoms marked as frozen will not move at all, but all forces are calculated normally. 
One way of interpreting the non-moving frozen atoms is that they have infinite mass. The total 
energy is still conserved in the system. An implementation is illustrated in listing 8.4. 

bool po i n t _ i s _ a _ so I i d ( Vector3 &position) { 

int voxe I _ i n d ex _ i = position. x / sy stem _ 1 e ngt h . x * n u m _ voxels . x ; 

int voxel _ i ndex_j = position. y / sy ste m _ le ngt h . y * n u m _ voxels . y ; 

int voxe I _ i ndex_ k = position. z / sy ste m _ le ngt h . z * n u m _ voxels . z ; 

// The world matrix is a binary matrix 

return wor I d _ m a t rix [ voxe I _ i n dex_ i , voxel _ i n d ex _j , voxel _ i n d ex_ k ] ; 

} 

void ma rk_frozen _ atoms ( ) { 

for(int i =0; i <number_of_atoms ; i++) { 
Vector3 &position = p o s i t i o n s . a t ( i ) ; 
i f ( po i n t _ i s _ a _ so I i d ( p os i t i o n ) ) { 
atom_types . at ( i ) = FROZEN; 

} 

} 

} 

void move() { 

for(int i =0; i <number_of_atoms ; i++) { 
if (atom_types . at ( i ) != FROZEN) { 
// Only move non-frozen atoms 
positions. at(i).x += velocities . a t ( i ) .x*dt; 
positions. at(i).y += velocities . a t ( i ) .y*dt; 
positions. at(i).z += velocities . a t ( i ) .z*dt; 

} 

} 

} 



Listing 8.4: Example code showing how to mark atoms within a solid. 

If the atoms forming the solid are dense enough, very few atoms will be inside the wall, so 
the first requirement is satisfied. The interaction between the solid and the flowing fluid is as 
realistic as the potential is, so the only requirement not satisfied is the drainage of the energy. 

8.4.2 A simple model of a solid 

We can improve the solid model by adding a harmonic oscillator potential to all the frozen 
atoms. Instead of freezing them completely, we allow them to vibrate around their equilibrium 
position q which is the initial position of the simulation. The force on atom i in the solid is then 

F( s )(r i ) = -fc(r i -qi)-^24e 

where k is the strength of the oscillator, is the equilibrium position for atom i and the 
last part is the Lennard- Jones potential described in section 7.2. The energy is of course still 
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conserved with this model, but we can now apply a thermostat on the solid atoms making them 
act as a large reservoir trying to keep the temperature at some wall temperature T w . With this 
model, we can study flow in any geometry with a behavior near that of the DSMC model. An 
implementation is shown in listing 8.5. 



void a p p I y _ co n st a n t _ f o rce ( ) { 

for ( n =0; n<n um ber_of_atoms ; n++) { 

if (atom_type . at (n) != FROZEN) v e I o c i t i e s . a t ( n ) . a t ( f I o w 
+= acceleration*dt; 



direction 



} 



} 



void a p p I y _ h a r m o n i c _ osc i 1 1 a t o r ( ) { 
double s p r i n g _ co n st a n t = 1000.0; 
for(n=0; n<number_of_atoms ; n++) { 
if (atom_type . at (n) = FROZEN) { 
double dx = p o s i t i o n s . a t ( n ) 
double dy = p o s i t i o n s . a t ( n ) 
double dz = p o s i t i o n s . a t ( n ) 



accelerations . at(n) .x += 
accelerations . at(n) .y 
accelerations . at(n) . z 



} 

void ste p ( ) { 

ha If _ kick () ; 
move ( ) ; 
mpi_move ( ) ; 
mpi_copy () ; 

reset_accelerations() ; 
a p p I y _ co n st a n t _ f o rce ( ) ; 

a p p I y len na rd _jones () ; 

apply_harmonic_oscillator() ; 
ha If _ kick () ; 



x — i n i t i a I _ p o s i t i o n s . a t ( n ) . x ; 
y — i n i t i a I _ p o s i t i o n s . a t ( n ) . y ; 
z — i n i t i a I _ p o s i t i o n s . a t ( n ) . z ; 
-s p r i n g _ co n st a n t * dx / mass; 
= — s p r i n g _ co n st a n t * dy / mass; 
= — s p r i n g _ co n st a n t * dz / mass; 



} 



Listing 8.5: Implementation of the harmonic oscillator model of a solid. 
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Figure 8.5: The spatial domain for a processor with the added ghost atoms in the gray area. 
The domain is divided into cells of size r cut so that the yellow cell only needs to compute forces 
between atoms in the cell and the neighboring 8 green cells. The processor will only process the 
cells (the ghost atoms are computed on another processor). 
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Results 



In this chapter we discuss the results of our MD simulations. Here we start by a scaling perfor- 
mance benchmark in section 9.1 before we in section 9.2 move on to a validation of the Knudsen 
correction we confirmed with the DSMC model. Most of the concepts are equivalent as in DSMC 
in section 6.2, so the reader is is assumed to have read that section. 



9.1 Parallelization performance 



The parallelization scheme we have used in MD is very similar to the one we used in DSMC. 
In section 6.2, we measured both the strong and the weak scaling efficiency {rj s and rj w ) which 
measure two different ways of scaling the program. We will not repeat the discussion about 
these except present the result for both scaling for the MD program. 



9.1.1 Strong scaling 

The strong scaling efficiency r/ s tells us how well the program scales if we keep the system size 
constant while increasing the number of processors. The strong scaling efficiency was defined as 

* - W~ N ' <£u) 

where t± is the total run time using one processor and i^y is the total run time using N processors. 
In this benchmark, we chose a system consisting of 48 x 48 x 48 = 110592 unit cells which 
gives a total of 442368 atoms. When we increase the number of processors, these 48 unit 
cells per dimension are distributed on the processors. The timestep was At = 0.02 and the 
initial temperature was approximately To =100 K so that the final gas temperature was around 
T =60 K. This fall in temperature is explained by the fact that the FCC lattice is the potential 
minimum, so with a non-zero temperature, some of the initial kinetic energy will be converted 
to potential energy. We run the program with the number of processors in the range 1 to 512. 
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In table 9.1 we have presented the results for this benchmark which is summarized in figure 9.1. 
We see that when going from one to two processors, we get a more than ideal speedup with 
Vs(Ncpxj = 2) = 1.17 which can be explained by the CPU cache. When a processor compute 
with a value stored at some memory address, it will first look in the three levels of cache to 
see if the value of the memory address is cached there. Cached values are much faster available 
for computation than those only in the memory. When going from one processor to two, a 
larger part of the positions array (which is used in the force calculation) can be cached, hence 
a more than double speedup is obtainable. When using a larger number of processors, the MPI 
communication time starts increasing so that the total time increases per CPU. 



Ncpu 


^atoms/^CPU 


tN 




1 


442368 


29196 s 


1.00 


2 


221184 


12471 s 


1.17 


4 


110592 


6435 s 


1.13 


8 


55296 


3469 s 


1.05 


16 


27648 


1926 s 


0.95 


32 


13824 


1021 s 


0.89 


64 


6912 


715 s 


0.64 


128 


3456 


375 s 


0.61 


256 


1728 


220 s 


0.52 


512 


864 


150 s 


0.38 



Table 9.1: Benchmark results showing the strong scaling efficiency i] s for the MD program. 
We see that when going from one to two processors, we get a more than ideal speedup with 
r] s (Ncpu = 2) = 1.17 which can be explained by the CPU cache. When a processor compute 
with a value stored at some memory address, it will first look in the three levels of cache to 
see if the value of the memory address is cached there. Cached values are much faster available 
for computation than those only in the memory. When going from one processor to two, a 
larger part of the positions array (which is used in the force calculation) can be cached, hence 
a more than double speedup is obtainable. When using a larger number of processors, the MPI 
communication time starts increasing so that the total time increases per CPU. 
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Weak scaling efficiency 



1.2 




0 2' 1 — 1 — 1 — 1 — 1 

10° 10 1 10 2 10 3 10 4 

Number of processors 

Figure 9.1: Benchmark results showing the strong and the weak scaling efficiency, r] s and rj w , 
for the MD program. We see that when going from one to two processors, we get a more than 
ideal speedup with rj s (Ncp\j = 2) = 1.17 which can be explained by the CPU cache. When a 
processor compute with a value stored at some memory address, it will first look in the three 
levels of cache to see if the value of the memory address is cached there. Cached values are much 
faster available for computation than those only in the memory. When going from one processor 
to two, a larger part of the positions array (which is used in the force calculation) can be cached, 
hence a more than double speedup is obtainable. When using a larger number of processors, the 
MPI communication time starts increasing so that the total time increases per CPU. The weak 
scaling shows similar scaling in the whole range of processors. In order to reduce the statistical 
noise, several benchmarks should be run and averaged. 



9.1.2 Weak scaling 

If we increase the number of processors, but keep the nubmer of atoms per processor constant, 
we can use the weak scaling efficiency rj w to see how the program scales in this case. The weak 
scaling efficiency is defined as 



h 

IJw = — 



(9.2) 
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where again t\ is the total run time using one processor and is the run time using N processors. 
In this benchmark, we chose 10 x 10 x 10 = 1000 unit cells per processor yielding a total of 4000 
atoms per CPU. The timestep here as well is At = 0.02 with the same initial temperature as 
in the strong scaling so that the final temperature is approximately T =60 K. In table 9.2 and 
figure 9.1, we have presented the results for the weak scaling. 





-^atoms 


t N 


Vw 


1 


4000 


1246 s 


1.00 


2 


8000 


1278 s 


0.97 


4 


16000 


1398 s 


0.89 


8 


32000 


1441 s 


0.86 


16 


64000 


1521 s 


0.82 


32 


128000 


1620 s 


0.77 


64 


256000 


1639 s 


0.76 


128 


512000 


1735 s 


0.72 


256 


1024000 


2027 s 


0.61 


512 


2048000 


3379 s 


0.37 



Table 9.2: Benchmark results showing the weak scaling efficiency rj w for the MD program. 



9.2 Flow in a cylinder, varying Knudsen number 

We have used the MD program to simulate flow in a cylinder with a fixed radius R, just like we 
did in section 6.3 with DSMC. We will measure the permeability to see how well the Knudsen 
correction factor (see section 2.13) predicts the permeability for very small pores (here a cylinder) 
with an atomic model. The cylinder was created with the solid model we described in section 
8.4.2. Since the solid now consists of atoms (in DSMC it was just a scalar field defining the 
surface) , we should create the cylinder carefully. We have prepared the system with the following 
steps 

• Heat the system 2000 timesteps, T =300 K 

• Thermalize the system 2000 timesteps 

• Heat the system 2000 timesteps, T =300 K 

• Thermalize the system 2000 timesteps 

• Create cylinder (explained below) 

• Reduce density (explained below). 

By first heating the system, we let the system melt from a solid state to a liquid state. This 
allows the system to become more random than in the initial lattice. Once we create the cylinder, 
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we apply a harmonic oscillator potential on the atoms in the cylinder so they more or less stay 
in their initial position. We have here chosen a system consisting of 64 x 64 x 64 = 262144 unit 
cells which gives a total of 1048576 atoms to begin with. This in turn yields a system with size 
Li = 336.64 in the ith dimension. By choosing the cylinder radius to be r = 0.45Lj and the flow 
in the z-direction, we mark all atoms within a distance r from the center (in the xy-plane) as gas 
atoms, and atoms outside r as solid atoms. But the cut-off radius was chosen to be r cu t = 2.5a 
(see section 7.2.2), so the gas atoms inside the cylinder will not feel the presence of the atoms 
outside r + 2.5a directly. To save computation time, we have removed all atoms outside this 
radius. Such a cylinder is shown in figure 9.2 where the yellow atoms are the solid wall whereas 
the green atoms are the gas. Once we have the cylinder, we can choose the density yielding a 




Figure 9.2: The final cylinder we used to simulate gas to validate the Knudsen correction factor 
for the permeability (see section 2.13). Here the yellow atoms are the solid wall (with a harmonic 
oscillator potential on each atom, keeping the atoms in place), and the green atoms are the gas. 
Since we have used a cut-off radius r cu t = 2.5a, we have removed the solid atoms outside the 
radius r + 2.5a. This visualization is done with the tool discussed and developed in chapter 11. 



desired Knudsen number 



Pn(Kn) 



1 



(9.3) 



y/2-Kd 2 KnL ' 

where we have used d =3.62 A as we did in DSMC. The flow is induced in the same way as we 
did in DSMC (explained in section 4.5), where we can, given a pressure difference AP, accelerate 
the atoms inside the cylinder according to equation (4.41) 

AP 

9 = — aT- 
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Here, Ax is the system length in the flow direction, L z . We have chosen a pressure difference 
equal to 0.2Pn, where Pq = pnksT being the ideal gas pressure. We can then for each Knudsen 
number induce flow and measure the permeability. The system reached an equilibrium state be- 
fore we sampled for 500000 timesteps. In figure 9.3, we have plotted the measured permeabilities 
for different Knudsen numbers with the Knudsen corrected analytical solution 



k a = [1 + a(Kn)Kn] 



1 + 



4Kn 
1 + Kn 



(9.4) 



The left figure, using a logarithmic x-axis shows that the permeabilities in the lower range 
of the Knudsen numbers are predicted very by the theory. The right figure shows that the 
permeabilities in the whole range, two orders of magnitudes, follows the expression in equation 
(9.4). For the high Knudsen numbers, we see an increase in the statistical noise which is explained 
by the fact that a high Knudsen number is obtained by a low density which gives a low number 
of atoms. In order to get better statistics, we would need to run the simulation for a longer 
time. 
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Figure 9.3: Permeabilities for Knudsen numbers in the range 0.1 to 10.0 in a cylinder with 
radius r =151 A. The left figure has a logarithmic x— axis to emphasize the good prediction in 
the lower Knudsen range. The right figure shows that the MD code produces results according 
to the Knudsen corrected permeability for a cylinder in equation (9.4) in the entire range. The 
increased statistical noise is explained by that for large Knudsen numbers, the number of gas 
atoms is low. 
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Whenever scientists perform an experiment in the lab or on a computer, they need to study 
the information outcome of the experiment. This could be the positions of some particles, the 
temperature of a gas or maybe the color of a fluid. Once we have this information, we need to 
represent it in a way that is useful for understanding the results. We often put numbers in a table 
or plot them as a graph. This is an extremely powerful ways to look at data and we can learn a 
lot by seeing how two or more variables are dependent of each other. It is convenient to introduce 
a general term, visualization, which is just a visual representation of information (maybe not any 
visual representation. Numbers in a table might be excluded.) One could say that a world map 
is a representation of the geometrical information describing how the continents are connected. 
The graph of some data is a relation between two or more variables. The output result from 
any computer simulation or experiment is information that can be visualized in some way. 

There already exists a lot of software that can be used to visualize data in different ways. For 
particle simulations one can for example use VMD 1 or Ovito 2 . Both of these are great tools 
allowing us to visualize the time trajectories of atoms or particles. There are two drawbacks 
that have motivated the author to write a visualization tool from scratch. 

In both VMD and Ovito, the way we navigate with the camera is inspired by other 3d software 
like 3ds Max, Maya and Blender where the camera is pointing towards a point whereas the mouse 
controls the position of the camera on the surface of a sphere centered in point. If we want to 
follow the movement of a single particle, this way of controlling the camera is inconvenient. 
We would like to control the camera in the way a space ship is controlled in a game, like in a 
first person shooter game. Here the mouse controls the direction the camera is looking and the 
camera position is controlled by the keyboard. 

In addition, there are performance drawbacks with both programs where we have noticed that 
medium sized datasets (number of particles ~ 1 X 10 6 ) lead to a frame rate that makes it 
very difficult to navigate around in the system. Also, the geometry of the DSMC code cannot 
be visualized in VMD or Ovito since these are particle visualizers only and the geometry is 
represented as a scalar field (the voxels from section 5.1). To be able to solve these problems, 
we have implemented our own visualizer using OpenGL. In this chapter, we first give a brief 
introduction to OpenGL in section 10.1 explaining its basic ideas. We go through the concepts 
of vertices, primitives and how colors and textures are linearly interpolated from the values of 
the vertices. Vertex Buffer Objects are briefly explained in section 10.2 before we go through the 
sequence of shaders in the rendering pipeline in section 10.3. Note that this is not a complete 
introduction of OpenGL. Only the basic concepts that are required to understand how we have 
made the visualization tools are covered. 



ht t p : / / www .ks.uiuc.edu/ Research / vmd / 
2 http://www. ovito. org/ 
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10.1 What is OpenGL? 



OpenGL (Open Graphics Library) is an open source graphics library providing an application 
programming interface (API) to communicate with a graphics processing unit (GPU) in order 
to get hardware-accelerated rendering. Rendering is the process of generating an image from 
models in an environment. These models contain information about the geometry and associated 
textures. A model is typically represented by a set of points, triangles or some other set of 
primitives. Before discussing the model, we should explain what a primitive is. 



10.1.1 Primitives 



In OpenGL, the primitive decides how to interpret a set of vertices. The same set of vertices 
may be interpreted, hence rendered, very differently on the screen. Three points can be used 
to draw three single dots, a triangle or a part of a rectangle among other different geometries. 
To illustrate this idea, in figure 10.1, the four vertices {(0, 0), (0, 1), (1, 1), (1, 0)} have been 
rendered with the primitives GL_ POINTS, GL_ LINES, GL_ TRIANGLES, GL_ QUADS and 
GL_ TRIANGLE '_ STRIP. The rendered objects are of course very different depending on the 
primitive. When using for example the GL_ TRIANGLES, OpenGL interprets groups of three 
vertices as one triangle. In the case of GL_ QUADS, it will of course use groups of four vertices 
to define the renderable object. All of the primitives (except GL_POINTS and GL_LINES) 
form one or more two dimensional surfaces that are colored from either the interpolated values 
between the vertices (which have a defined color) or from a texture map. 
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Figure 10.1: The vertices {(0, 0), (0, 1), (1, 1), (1, 0)} have been rendered as the prim- 
itives (from left) GL_ POINTS, GL_ LINES, GL_ TRIANGLES, GL_ QUADS and 
GL_ TRIANGLE^ STRIP. We see that the final rendered geometrical objects are quite 
different for the different primitives. 

10.1.2 Color interpolation 

When creating a primitive consisting of N vertices, we can color each vertex r-j with an RGBA 
vector 



(ri,gi,bi,ai) 



;io.i) 



where the components are the red, green, blue and alpha values for vertex i. The first three 
components define the color whereas the last component is the opacity. Since we only specify the 
color for the vertices, there are a lot of points in between them that do not have a defined color 
value. OpenGL assigns colors to these inner points by linearly interpolating the color values 
from the vertices. As an example, let us say we have a triangle with three vertices p a , p;, and p c . 
Any point p in this triangle can be uniquely specified by using barycentric coordinates which is 
a set of three numbers (a, b, c) in the range [0, 1], normalized so that a + b + c = 1 [21]. Once 
we have these coordinates, the point p in the global coordinate system (in which the vertices pj 
are defined) is found as 



p = ap a + bp b + cp c . 
The barycentric coordinates are found through 



10.2) 



^(P,Pfe,Pc) 



^(P,Pa,Pc 
-4(Pa,Pfc,Pc 



^(P,Pa,Pfe) 
^(Pa,Pfc,Pc)' 



10.3) 



where A(&, b, c) is the area formed by the three vertices a, b and c. If we want to find the color 
of a point p inside the triangle, We use the barycentric coordinates as the weights to linearly 
interpolate the colors specified for the vertices 



c(p) = ac a + bc b + cc c 



10.4) 



where Cj is the color we gave the vertex at p^. In figure 10.2 we see how the colors are interpolated 
from the three values at the triangle vertices. 
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Figure 10.2: The three vertices {(0, 0), (1, 1), (2, 0)} colored green, blue and red, forms a triangle 
where the inner points of the triangle is colored by the interpolation between the three vertices. 

10.1.3 Textures 

Instead of using the colors specified per vertex, we can upload a texture (for example an image) 
to the graphics card so that the GPU can use that image as the source of how to color the pixels 
(this can actually be combined with the color values as we soon will see). A texture consists of 
m x n pixels where each pixel has a color Cf(i,j). Each vertex i of a triangle is assigned a local 
texture coordinate 

ti = (t x ,t y ), (10.5) 

where t x ,t y £ [0,1]. This is done so the pixel coordinates are independent of the size of the 
texture. We have a simple mapping to the global texture coordinates (the actual pixel coordinates 
in the image) t* = (t x ,t*) where 

t* x = m-t x 

t* y = n-t y , (10.6) 

where we have added the asterisk on the global coordinates since we will mostly use the local 
ones. If a pixel with local coordinate t in the texture has a color value Ct(t), we can find the 
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color c(p) of a point p within the triangle by interpolating the local texture coordinates just like 
we did with the colors. Remember that the point p has barycentric coordinates (a, b, c), which 
gives the interpolated local texture coordinates t 

t(p) = ot a + bt b + ct c , (10.7) 

where tj is the local texture coordinate assigned to vertex i. The color of this point p is then 
simply 

c(p) = ct[t(p)]. (10.8) 

In figure 10.3 we have illustrated how the texture interpolation works. In the left figure, the 
texture is mapped with texture coordinates corresponding to the coordinates of the triangle 
vertices. In the figure to the right we have a skewed mapping making the image look skewed. 
We can of course combine both these methods (colors and textures) by applying both a texture 




Figure 10.3: Showing how texture interpolation works. Left: the three vertices 
{(0, 0), (1, 1), (2, 0)} are assigned the local texture coordinates {(0, 0), (0.5, 1), (1, 0)} where 
we see how the texture are transformed onto the triangle . Right: the three vertices 
{(0, 0), (1, 1), (2, 0)} are assigned the local texture coordinates {(0, 0), (0, 1), (1, 0)} where we 
see that the texture on the triangle is skewed because of the transformation. Note that the 
coordinates in the figure at the vertices are the texture coordinates, not the coordinates of the 
vertices. 

coordinate and a color value to each vertex. OpenGL will then render an image where the 
rendered color C becomes 

C(p) = ct[t(p)] 0 c(p), (10.9) 

where 0 is the element-wise multiplication operator defined as 

(sLQb)i = ai-bi, (10.10) 

where a, b S R N and di is the i'th component of vector d. In figure 10.4 we have combined both 
the color and the texture giving a triangle with the same image of Albert Einstein colored in 
the same way as the triangle in figure 10.2. 
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Figure 10.4: We can combine both colors and textures to color points within a triangle according 
to equation (10.9). Here we have used the same colors as in figure 10.2 and the texture of Albert 
Einstein in figure 10.3. 

10.1.4 Model 

A model of an object contains all the information fully defining how a geometric object looks 
like without any effects from the environment, such as light. The model is fully described by a 
set A4 containing primitive objects V, each having 

• a vertex array, 

• the primitive type, 

• a color array, 

• a texture id, 

• a local texture coordinate array, and 
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• a normal vector array. 

We have discussed the vertices, the primitive, the color and texture coordinate arrays (remember, 
one color and one texture coordinate per vertex). We also need a texture id (which is just an 
int variable) allowing us to tell the GPU which texture it should apply to this primitive object. 
In addition, we can assign a normal vector to each vertex which can be used to create realistic 
lighting and other effects. It's time for an example of a model. 

We now want a model of a die. A die is a cube with six faces. The model Ai would then 
contain six primitive objects "Pi, each having an array of four vertices. The primitive type would 
be GL_ QUADS where the color of course could be your favorite color. We need to upload six 
textures (one face has one dot, another face has two dots etc) which gives us six different texture 
id's. The local texture coordinate array is simply the four corners of the unit square whereas the 
normal vectors should point outwards from the cube. In figure 10.5 we have used this technique 
to draw a red die. 




Figure 10.5: Drawing a die using the technique described in subsection 10.1.4. The color was 
chosen to be red whereas each of the six faces were assigned one of the textures in the black box. 
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10.2 Vertex Buffer Objects 

A Vertex Buffer Object (VBO) is a feature in OpenGL that allows the user to upload an array of 
vertices to the graphics card which can be used for very fast rendering. This could for example 
be a set of positions, colors or normal vectors. Once this data is uploaded to the GPU, we can 
render the model described by the VBO in very few function calls. We ask the GPU to give us a 
buffer identifier. Then we tell the GPU to allocate enough memory to store our vertices before 
we copy the data from the main memory to the graphics card. The data is now available on the 
graphics card. 



10.3 Rendering pipeline 

Now that we know what a VBO is, we are ready to render the objects described by a set of 
VBOs on the screen. First we activate the data we want to render by telling the graphics card 
(of course by using the OpenGL API) which VBO identifier represents the positions, colors, 
normal vectors and texture coordinates. Then, we ask the GPU to render the activated data 
as the primitive the vertices represent (we know if the vertices should be rendered as triangles, 
quads or lines). 

We can now make full use of the parallel architecture of the GPU. On a high end NVIDIA 
graphics card 1 , there are often more than 2000 cores available to run small programs called 
shaders that process the input data from vertices to the final image rendered on the screen. The 
shaders are written in a programming language called GLSL 2 and are compiled on the GPU 
runtime. The full job of processing all the vertices calculating the final color values for all the 
pixels are distributed on these cores. In this section we discuss the different rendering steps in 
what's called the rendering pipeline that are relevant to develop the visualization tools in this 
thesis. In figure 10.6 we have illustrated the rendering pipeline containing the stages that we 
have used. 



10.3.1 Ins, outs and uniforms 

As we have already mentioned, the shaders are small programs run in parallel. Each shader gets 
some input data which are called input- variables. These are specific values for this instance of 
the shader, for example the position vertex for a particle. The shader then usually calculates 
something (a common task in the first shader stage is to convert a vertex from the model space 
to the projection space), before it sends a variable to the next step in the pipeline. The variables 
going out of a shader is of course called an output-variable. Just to be sure we understand; the 
next shader gets this output-variable as an input-variable for further processing. 

x Such as the NVIDIA GTX Titan. At the time of writing, it costs about 8000 NOK on http://www.komplett. 

no. 

2 GLSL (OpenGL Shading Language) is a programming language with syntax similar to C. It provides a set 
of linear algebra operations that are heavily used in computer graphics. 
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Figure 10.6: Diagram of the OpenGL rendering pipeline. The gray boxes are shader steps (the 
shader programs can be written by the user). Each vertex is processed in the vertex shader 
before it is sent to the geometry shader that can create zero or more vertices based on the input 
vertex. Each vertex is then analyzed in the clipping program that removes vertices that cannot 
be rendered. The rasterization program creates a two dimensional image of every triangle so 
that the fragment shader can merge everything to color values per pixel on the screen. Not that 
the full pipeline contains more steps, but these are the relevant stages for understanding what 
our visualization tool does. 

But there are of course some variables that are constant through the whole rendering process. 
If all the particles have the same color, we don't need to specify the color per atom (this would 
use a lot of extra bandwidth on the GPU). Variables like this should be sent to the GPU as 
so-called uniform- variables. They are available in all instances of every shader stage. 



10.3.2 Vertex shader 



The vertex shader is executed once per vertex in the input VBOs. It specifies which input vertex 
that is to be interpreted as the position, color, normal vector and/or texture coordinate if they 
are specified. The input position vertex is usually in the model space which are local coordinates 
for this specific object. A typical vertex shader applies the Model View Projection matrix on 
the vertex, transforming it from the model space to the projection space which often is assumed 
in the later pipeline stages. The latter may be done on the geometry shader instead if a three 
dimensional object is created at that stage. 



10.3.3 Geometry shader 

Each vertex from the vertex shader is a part of a primitive (such as GL_ POINTS or GL_ TRIANGLES). 
A geometry shader takes a primitive (i.e. a set of vertices, each processed by the vertex shader) 
as input and outputs zero or more primitives that may be of another type than the input prim- 
itive. A typical use is to describe a geometrical object (this could be a sphere or a tube) on 
the geometry shader so that the input primitive is just one single vertex; the position. This 
significantly reduces the memory and bandwidth usage on the GPU which usually gives great 
performance improvements. 

The geometry shader can also be run in instancing mode which means that the shader program 
is run a given number of time per vertex with an invocation id given. If a particle system obeys 
periodic boundary conditions, we can for example use the geometry shader instancing to add 26 
periodic copies of the system making the system look larger than it is. 
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10.3.4 Clipping 

After the geometry shader has decided which primitives we want rendered, some of the vertices 
may not be visible on the screen. They can be behind the camera, too far to the right or in some 
way outside the camera view. When we in the final stages (the fragment shader) are going to 
decide colors on every pixels, we don't need to compute primitives that does not contribute to 
the final image. This is called clipping and is a very simple process to perform in the projection 
space. All vertices outside the clip volume (the volume that will be rendered) will be discarded 
and not computed at the rasterization stage. 

10.3.5 Rasterization 

The rasterization program will for each primitive determine which pixels that are a part of the 
primitive. Each active pixel will have interpolated colors and texture coordinates as described in 
subsection 10.1.3 before we reach the fragment shader, the final stage in the rendering pipeline. 

10.3.6 Fragment shader 

The fragment shader is executed one time per pixel for each of the primitives that contributes to 
that pixel. If the primitives are transparent, the final color value on a pixel will be a combination 
of all the primitives. The fragment shader gets interpolated color values and texture coordinates 
(like we explained in section 10.1.2) that are used to decide the actual color of that pixel. This 
is the last step we cover in the OpenGL pipeline. The final color values are then written to a 
framebuffer which finally is the image that is shown on the screen on the computer. 



Chapter 11 



Advanced particle visualizer 



With our newly acquired knowledge about OpenGL and learned how we can use the API to 
render objects on the screen, we have everything we need to develop our own visualization tools 
that can handle datasets from both MD and DSMC. As we now should be well aware of, the 
state of a system with N particles is described by the 3N particle positions and the 3N velocity 
components. If we save this information every timestep of a simulation, we can use it to render 
a time series, an animation of the trajectories of all the particles. We will render the particles as 
spheres, but are going to cheat a bit. An actual sphere rendered in OpenGL would need to be 
composed of many triangles forming the spherical shape. To be able to render a smooth sphere, 
we would need more than 100 triangles per sphere as we will see in section 11.1. We will apply 
a trick used in computer games for years. Instead of rendering spheres, we use something called 
billboards, which is a rectangle with an image (a texture) of a sphere, always pointing towards 
the camera. In section 11.1, we explain how we effectively can create and render billboards with 
the geometry shader on the GPU. If the particle system has periodic symmetry (both MD and 
DSMC use periodic boundary conditions), we can also use the geometry shader to render copies 
of the system, making the illusion that the system is larger than it really is. 

We did a performance test to see the amount of particles it is possible to visualize with this 
method. The benchmark measure is the number of frames per second the program can manage 
to render. This is done in section 11.1.3. When we visualize a dataset from a DSMC simulation, 
we should, in addition to the particle positions, render the surface geometry (which, as we 
remember from section 5.1, is a voxelized scalar field). We will then be able to see the surface 
the particles collide with which will make it easier to understand their behavior. In section 11.2, 
we discuss the so-called marching cubes algorithm, which allows us to create a set of renderable 
triangles from the iso-surface of a scalar field (the points where the scalar field values intersect 
some value). 
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11.1 Billboards 



We could in principle create a spherical shape by connecting triangles or other primitives as 
illustrated in figure 11.1. The figure contains two spheres with different numbers of primitives. 
We need a substantial number of primitives to get a shape that looks like a sphere (200 in the 
left sphere in the figure), but we can cheat a bit, by instead rendering something that looks like 
a sphere A billboard is, as the name suggests, a rectangle filled with a texture, always pointing 




Figure 11.1: A sphere can be created by connecting triangles or other primitives. The sphere 
to the left is made of 200 primitives whereas the one to the right is made of 25 primitives. It is 
clear that we need a certain number, more than 25, of primitives before we are convinced that 
the object should represent a sphere. 

towards the camera. The texture is going to be a circle (a sphere does indeed look like a circle 
when rendered on the screen anyway). It is quite easy to render such a rectangle with OpenGL, 
we already have a primitive called GL_ QUADS which is exactly what we need. The only thing 
we need to do is provide four vertices, the corners in the rectangle, and give each vertex a texture 
coordinate. The four vertices Vj can be calculated from one single vertex r, the particle position, 

by 

Vl = r + (-Ax, -Ay) (11.1) 
V2 = r + (—Ax, Ay) 
v 3 = r + (Ax, Ay) 
v 4 = r + (Ax, -Ay), 

where {L x = 2Ax,L y = 2Ay) is the size of the billboard, see figure 11.2. This is already what 
Ovito does 1 , but we are going to improve this even more. Instead of computing the four vertices 

1 See the source code at http:/ /www.ovito.org/index.php/download2 
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Figure 11.2: A billboard is made of four vertices V"i,V2,V3 and V4 that can be calculated from 
one vertex r (equation (11.1)). 

and uploading these as GL_QUADS to the GPU, we will only upload the positions of the 
particles, and exploit the geometry shader to create the billboard vertices on the GPU. 

Before we start visualizing the data, all timesteps of a simulation are uploaded as VBO's on the 
GPU so that we don't need to upload data every frame. The VBO is rendered as GL_ POINTS 
so each vertex represents the center position of a particle. We will now follow a single vertex 
through the pipeline 



11.1.1 The pipeline 

The VBO can now be seen as one OpenGL model containing the positions of all the particles. 
The vertices are then in the model space. Every single vertex starts its life in the pipeline by 
going into the vertex shader. The vertex shader is pretty simple in this case, it just transforms 
the input vertex from the model space to the projection space as we can see in listing 11.1. 

^version 330\n" 

uniform mat4 q t _ M od el Vi e w P roj ect i on M a t rix ; 
in vec4 qt Vertex; 
void main (void ) 



{ 



gl Position 



q t _ M od e I V i ew P roj ect i o n M a t r ix * qt_Vertex ; 



} 



Listing 11.1: billboardVertexShader.glsl 
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This position is then sent to the geometry shader as an instance of the GL_ POINTS primitive. 
The geometry shader will, from that single vertex (the position, now in the projection space), 
create and emit four vertices that are displaced from the input vertex as explained in equation 
(11.1) and figure 11.2. The output primitive is here the GL_ TRIANGLE_ STRIP which will 
create two triangles with vertices {vi,V2,V3} and {v2,V3,V4}. The size of the billboard is 
available as a uniform (as explained in subsection 10.3.1) in the geometry shader. If we want 
to visualize particles with different sizes (e.g. two atom types with different atomic radii), we 
just render different VBO's with the corresponding size. We want the four vertices to form a 
rectangle pointing towards the camera, so the operation in equation (11.1) should be done in 
the view space (here the z-direction is orthogonal on the camera plane which forms the xy- 
plane). Since the position vertex now is in the projection space, we should transform the four 
displacement vectors to the projection space by applying the qt_ ProjectionMatrix before we add 
them together. We then set the texture coordinate (the local coordinate in the texture image) 
on each vertex and emit it with the EmitVertex() function. The algorithm might be easier to 
understand with a code example, see listing 11.2. 



^version 400 






layout ( points ) in; 






layout ( triangle strip , max vertices = 4 ) out; 






uniform mat4 qt ProjectionMatrix; 






uniform vec2 size; 






out vec2 texCoord ; 






void main (void) { 






vec4 pos = gl in[0].gl Position; 






gl Position = pos + qt P roj ect i o n M a t r i x * vec4(— s i z e . x 


— size.y, 0.0, 




" 0.0) ; 






texCoord = vec2(0.0, 0.0); 






EmitVertexQ ; 






gl Position = pos + qt P roj ect i o n M a t r ix *vec4(— s i z e . x 


s i z e . y , 0.0, 


0.0) 


texCoord = vec2(0.0, 1.0); 






EmitVertexQ ; 






gl Position = pos + qt P roj ect i o n M a t r ix *vec4 ( s i z e . x , 


— size.y, 0.0 , 


0.0) 


3 

texCoord = vec2 ( 1 . 0 , 0.0); 






EmitVertexQ ; 






gl Position = pos + qt P roj ect i o n M a t r i x * vec4 ( s i z e . x , 


size.y, 0.0, 0 


■0) ; 


texCoord = vec2 ( 1 . 0 , 1.0); 






Emit Vertex ( ) ; 






EndPrimitive ( ) ; 






} 







Listing 11.2: billboardGeometryShader.glsl 



This is done with every single position vertex in the VBO (once per particle per frame) and all the 
output primitives from the geometry shader will be rasterized, clipped and texture coordinates 
are interpolated into the fragment shader that is run once per pixel per visible primitive. Before 
we discuss the fragment shader, we should take a look at the texture that each billboard will 
have. 
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We didn't lie before, we will use a circle that actually is a real sphere rendered in the 3D modeling 
program Blender 2 . With some lighting, we get this nice shiny effect making the sphereslook more 
interesting. This texture is shown in figure 11.3, where we also see that all the pixels outside 
the circles are transparent. This is very important so that we can discard these contributions in 
the fragment shader. The fragment shader will then get an interpolated texture coordinate as 




Figure 11.3: The texture used on the billboards is an image of a real sphere rendered with 
Blender. Notice that the pixels outside the circle are transparent. This allows us to discard 
these texture pixels in the fragment shader. 

input, which we will use to look up what RGBA value this pixel gets from the texture. If the 
alpha value (the opacity) is smaller than one, it means that the pixel contribution comes from a 
texture pixel outside the circle. In that case we just discard it. We almost forgot one last detail, 
the color of the particles. Again we assume that all particles have the same color (if not, we 
just use multiple VBO's per timestep), so as we did with the billboard size variable, the color is 
available as a uniform. The final pixel color is then found by equation (10.9) 

C(p) = ct[t(p)]0c(p). 

The fragment shader code is shown in listing 11.3 with the final rendering result in figure 11.4. 
In this rendered image, we also added light effects making atoms that are far away from the 

2 Available from http://www.blender.org/. 
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#version 330 

uniform vec4 color; 

uniform sampler2D qt TextureO; 

in vec2 texCoord ; 

out vec4 My FragColor ; 

void main (void) { 
My FragColor = 
MyFragColor = 
if ( MyFragColor 
discard ; 

} 

} 



Listing 11.3: billboardFragmentShader.glsl 



texture2D (qt_TextureO , texCoord . st ) ; 
MyFragColor * color; 
.a < 0.9999) { 




Figure 11.4: The final rendered image with an MD simulation using billboards. In the rendering, 
light effects are added to increase the feeling of depth, clarifying the pore structure. The atoms 
form a nanoporous SiC-2 system simulated with a MD code developed at the University of 
Southern California. 
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11.1.2 Periodic copies 

We used the geometry shader to create a GL_ TRIANGLE_STRIP creating the billboard from 
one single vertex (the particle position). Then, one may ask, why not use the geometry shader 
to create several billboards, adding periodic copies of every particle in the system? This makes 
sense if the simulation operates with periodic boundary conditions so we have periodic symmetry 
on all directions. So, if the system is a box of S1ZG Lx ^ ^ -^z: 

we can create 26 copies of the 

whole system giving a larger box of size 3L X x 3L y x 3L Z , making the impression that the system 
is larger. This provides an important effect near the boundaries of the system, see figure 11.5. 
We will use a feature in OpenGL called geometry shader instancing to add all the copies. With 




Figure 11.5: Adding periodic copies of the system provides an important effect when looking at 
atoms near the boundary. The upper image shows a visualization with no periodic copies of the 
atoms. This shows that the system clearly has a finite size. In the lower image, by adding 26 
periodic copies of the whole system, it is not obvious that we are at the edge of the system. 

instancing, the geometry shader is executed 27 times per input vertex. We can use the variable 
gl_InvocationID to identify which of the 27 instances we are running. The system coordinate 
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(S x , S y , S z ), Si £ {—1, 0, 1} for invocation id i is calculated as 

S x =(t mod 3) - 1 (11.2) 
S y =(fi/3l mod 3) - 1 (11.3) 
5 2 =(rV9l)"l, (H.4) 

where [a] is the ceiling function. One instance of the geometry shader should then render a 
billboard displaced with 

d = (L x ■ S x , L y ■ Sy, L z ■ S z ), (11-5) 

where the system size (L x , L y , L z ) is available as a uniform. The displacement d is given in 
the model space, so we must transform it the projection space before we add it to the position 
vertex (which, as we remember, already is in the projection space). The full geometry shader 
code using instancing to add periodic copies is found in listing 11.4. 



^version 400 

layout ( i n voc a t i o n s =27) in; 
layout ( poi nts ) in; 

layout ( t r i a n g I e _ st r i p , max_vertices = 4) out; 

uniform mat4 qt_ProjectionMatrix; 

uniform mat4 q t _ M od el Vi e w P roj ect i on M a t rix ; 

uniform vec2 size ; 

uniform vec3 systemSize; 

out vec2 texCoord ; 

void main (void) { 

int x = gl _ I n vocation I D % 3 — 1; 
int y = (gl_lnvocationlD/3) % 3-1; 
int z = ( g I _ I n voca t i o n I D /9) — 1; 

vec4 displacement = vec4 ( sy st e m S i ze . x*x , sy st e m S i ze . y *y , sy st e m S ize . z*z , 0 ) ; 

vec4 pos = g I in [ 0 ] . gl Position + q t _ M od e I Vi e w P roj ect i on M a t r ix * 

displacement ; 

gl Position = pos + q t _ P roj ect i o n M a t r i x * vec4(— s i z e . x , — size.y, 0.0, 0.0); 
texCoord = vec2(0.0, 0.0); 
EmitVertexQ ; 

gl Position = pos + q t _ P roj ect i o n M a t r ix * vec4(— s i z e . x , size.y, 0.0, 0.0); 
texCoord =vec2(0.0, 1.0); 
EmitVertex() ; 

gl Position = pos + q t _ P roj ect i o n M a t r i x * vec4 ( s i z e . x , —size.y, 0.0, 0.0); 
texCoord = vec2(1.0, 0.0); 
EmitVertexQ ; 

gl Position = pos + q t _ P roj ect i o n M a t r ix * vec4 ( s i z e . x , size.y, 0.0, 0.0); 
texCoord = vec2(1.0, 1.0); 
EmitVertexQ ; 
EndPrimitive ( ) ; 

} 



Listing 11.4: billboardGeometryShaderWithPeriodicCopies.glsl 
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11.1.3 Rendering benchmark 

A typical benchmark for a visualization program is the number of frames per second (FPS) 
the program is able to render a given amount of primitives. In this benchmark, we have used 
the multibillboard class 3 that has implemented the shader pipeline explained above. It renders 
a box with N billboard spheres and adds 26 copies in the geometry shader making the total 
number of visible spheres Adheres = 27iV. In table 11.1 we have shown a representable subset 
of the data showing the performance of the billboard class. We have run the program with the 
number of visible spheres in the range a few thousand to more than one billion. In figure 11.6 
we have plotted the data from table 11.1 with more data points near the sudden drops of frame 
rate. We assume that these drops can be explained by meeting different limits on the graphics 
card. It seems to happen at specific number of vertices emitted by the geometry shader, so our 
guess would be the bandwidth on the GPU. We have not used any shader profiling tools, so we 
cannot draw any conclusion without further investigation. 



-^spheres 


FPS 


3,375 


62.09 ±3.236 


27,000 


61.80 ±3.149 


729,000 


62.03 ±3.088 


5,832,000 


62.28 ±3.061 


19,034,163 


30.51 ± 1.601 


27,000,000 


30.36 ± 1.436 


45,499,293 


20.81 ±2.276 


70,957,944 


15.15 ±0.879 


110,592,000 


12.14 ±2.180 


132,651,000 


9.99 ± 1.066 


287,496,000 


4.78 ±0.8532 


421,875,000 


3.29 ±0.4548 


1,061,208,000 


1.40 ±0.2386 



Table 11.1: Benchmark showing the number of frames per second (FPS) the billboard class is 
able to render with the number of billboard spheres from a few thousand to more than one 
billion visible spheres. The benchmark is performed on an NVIDIA GTX Titan. 



The visualization tool for an MD state does not need anything else than what we now have 
presented. The timesteps containing the positions of all the atoms are uploaded to the graphics 
card as VBOs and the atoms are rendered as points into the rendering pipeline where the 
billboards are created in the geometry shader. This same rendering technique can of course be 
used to render the particles in a DSMC simulation, but we want to render the complex geometry 
that the particles interact with. We will use what's called marching cubes for this. 



3 Source code available from http://github.com/ComputationalPhysics/compphys-qt3d-additions. 
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Billboard rendering benchmark 
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Figure 11.6: Benchmark showing the number of frames per second (FPS) the billboard class 
is able to render with the number of billboard spheres from a few thousand to more than one 
billion visible spheres. The benchmark is performed on an NVIDIA GTX Titan. The sudden 
drops in FPS at specific number of spheres are probably explained by reaching the bandwidth 
limit in the different shader stages. 

11.2 Marching cubes 

The geometry in DSMC is represented as voxels as we discussed in section 5.1. They form a 
scalar field with values zero for empty voxels and non-zero for filled voxels. The surface of the 
geometry is where the values of the scalar field changes from zero to a non-zero value. This 
defines the iso-surface, i.e. a surface where all the values on one side are zero, and non-zero 
on the other side. If we find a way to visualize this iso-surface, it will coincide with where the 
DSMC particles collide. We then need to create some primitives (in this case triangles) that are 
connected to each other, forming a the full iso-surface. Such an algorithm exists and is called 
marching cubes. 

Marching cubes is an algorithm used to generate a set of connected triangles from the iso-surface 
of a scalar field. The method was presented in a paper published in 1987 and has been widely 
used in medical visualizations of CT and MRI scans [28]. Assuming that the scalar field is 
discretized in space, each point - vertex in our case - has a value larger or larger than some 
chosen iso-value. Given a cube consisting of eight of these vertices, there exists 2 8 = 256 unique 
combinations (each vertex has two possibilities). Because of symmetries (a cube with only one 
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vertex being larger than the iso-value has through rotations 8 different configurations that really 
are the same configuration), this set can be reduced to 15. In figure 11.7, we see the 15 unique 
configurations and the corresponding triangles in each configuration. For a given cube, the 8 
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Figure 11.7: A cube consisting of eight vertices. Given that each vertex can have a value 
smaller or larger than some given iso-value, the cube has has 2 8 = 256 different combinations. 
This number can, as we see here, be reduced to 15 due to symmetries. An iso-surface on a 
scalar field can then be generated as renderable triangles with this technique. Image from 
http://en.wikipedia.Org/wiki/File:MarchingCubes.svg, accessed 20 March, 2014. 



vertex values (one or zero) can be represented as the bits in an 8-bit integer. The final value of 
this integer is then the index of a precomputed table containing a list of all triangles needed for 
that configuration. 

The original authors thought the 15 combinations would be enough, but it turns out that with 
these 15 configurations, there exists cases where the surface gets holes - it is not topologically 
correct. This problem was solved in 1995 by using a larger set with 33 unique configurations 
which spans out the full configuration space [6]. We have used a precomputed table with 256 
configurations made from this basis of 33 elements. Each vertex is part of at least one triangle. 
We can then compute the normal vector per vertex as an average of the normal vectors of all 
triangles it is a part of. This enables the fragment shader to get interpolated normal vectors per 
vertex giving beautiful, smooth shadows as we can see in figure 11.8. Surfaces having a normal 
vector pointing towards the camera have maximum light intensity. All other surfaces will have 
reduced light intensity proportional to the dot product between the normalized camera-to-object 
vector and the normal vector. 

Since the geometry in DSMC model is described by a scalar field with values zero, one and two, 
we can choose iso-value of one and use the marching cube algorithm to generate triangles we 
can render on the GPU. The set of triangles is uploaded to the graphics card as a VBO and is 
rendered with a simple draw call. In figure 11.8, we have rendered the packed spheres we studied 
in section 6.4 with the marching cubes algorithm. Here we used 256 x 256 x 256 voxels obeying 
periodic symmetry in all directions. Two overlapping spheres will have a smooth, shared surface 
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as we see in the top of the figure. 




Figure 11.8: Randomly packed spheres in DSMC visualized with the marching cubes algorithm. 
The geometry is made up by 256 x 256 x 256 voxels obeying periodic symmetry in all directions. 
In the top of the figure, we see two how marching cubes elegantly renders two overlapping 
spheres. Normal vectors are computed for each vertex by averaging the normal vectors of all the 
triangles it is a part of. This enables the fragment shader to get interpolated normal vectors per 
vertex giving beautiful, smooth shadows. 



11.3 Gallery 

A picture is worth a thousand words. In this section we will show some screen shots from the 
visualization tool we have made. The DSMC visualizer can both render the state of the system 
while it integrates the system forward in time - a live visualization. This makes it easier to find 
interesting regions in the system, or is great as a show-off case. 
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Figure 11.9: Here we see a live simulation on a 2013 Macbook Pro. One million DSMC par- 
ticles in a fracture created with the diamond-square algorithm (a thanks to Filip Sund who 
implemented this algorithm). We use billboards to render the particles and the marching cubes 
algorithm to create a smooth surface from the voxelized scalar field. With a good frame rate it 
is easy to study flow in any region of the system. 
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Figure 11.10: This is a live simulation of four million DSMC particles in a system consisting 
of packed spheres using the same rendering technique as in figure 11.9. The camera is placed 
outside the system where we clearly see some of the spheres being cut in half because of periodic 
boundary conditions. 
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Figure 11.11: Here we see a nanoporous silicate simulated with an MD code developed at the 
University of Southern California. The system was created by preparing the system in the j3- 
cristobalite state. It was then heated to 4500 K, stretched (increasing the system size) before it 
was cooled down, quenched, to 300 K making this beautiful pore network. The pores were then 
filled with water. The total system consists of approximately 400,000 atoms, including water as 
shown in figure 11.12. 




Figure 11.12: This is the same system as in figure 11.11, but with the water visible. We clearly 
see the water molecules form with one oxygen and two hydrogen atoms. With a non-static 
picture, with time evolution, we see the hydrogen atoms vibrate and even change which water 
molecule it is a part of. 
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Figure 11.13: Again the same system as in figures 11.11 and 11.12, but with the camera outside 
the system. 
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12.1 Summary 

After having explained and discussed all models and the results we have obtained during our 
work, it is convenient to briefly summarize the thesis. The main physical problem we discussed in 
the beginning was shale gas extraction. In the fracturing process, the gas is released from within 
the shales through small pore networks, small channels with diameter from a few nanometers and 
up. Standard hydrodynamics breaks down at this scale because slip velocity and non-continuum 
effects, so we needed a model enabling us to study gas dynamics at this scale. In chapter 3 we 
briefly discussed statistical mechanics and kinetic theory, and used this to derive the Boltzmann 
equation, the mean free path and the mean collision time. These results were used to justify the 
DSMC model in chapter 4. We explained how the model is implemented in chapter 5. With the 
parallel implementation and the voxel-based representation of arbitrary geometries, we managed 
to achieve the first two goals a) and b) in section 1.3. 

In chapter 6, we discussed the simulations with both model validation, performance tests and 
permeability analysis. The implementation shows a promising scaling performance up to at least 
512 processors in addition to computing correct velocity profiles for a large range of Knudsen 
numbers compared to work of others. We confirmed that the Knudsen correction from section 
2.13 predicted the permeability very well for cylinders, a geometry with a well defined Knudsen 
number. In the case of packed spheres, the Knudsen number is less obvious how to define due 
to statistical spread in the geometrical formation of the spheres, we found that an expectation 
value of the pore size could be used to find what we called an estimated Knudsen number (see 
appendix A). We were able to predict the permeability to the correct order of magnitude, but the 
geometrical statistical spread was reflected through a spread in the final measured permeability. 
The ratio between the largest and the smallest permeabilities was a factor two or more. 
We then moved on to discuss the second model we have studied, Molecular Dynamics, with 
an introduction to the model in chapter 7. The numerical implementation was explained in 
detail in chapter 8. We also introduced a new model for simulating fluids confined by a solid 
that interacts with realistic atomic forces and remains being a solid with the initial geometry 
by adding a harmonic oscillator potential on the atoms. The results in chapter 9 showed that 
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the parallel implementation scaled satisfactory up to at least one thousand processors. We ran 
the MD program to measure the permeability in a cylinder for different Knudsen numbers, just 
like we did with DSMC. By simulating a similar system, but with a model based on different 
physical fundamentals, this is a good way to validate the Knudsen correction for both different 
length scales and a large range of Knudsen numbers. In MD, the cylinder was 30 times smaller 
than in DSMC (15 nm vs 450 nm), with results confirming that the Knudsen correction works 
very well also for pores as small as 15 nanometers. Combining the results from both models, 
the goals c) and /) were achieved. We will get back to the Knudsen correction in the discussion 
section below. We did not manage to implement the water-silica potential in MD as we wanted 
to do in goal e). This would allow us to use MD to study flow in more realistic systems than 
with the Lennard- Jones potential. 

In the last part of the thesis, the custom 3D visualization tool, we first gave an introduction to 
OpenGL and its purpose in chapter 10. We discussed the basics of graphics programming as 
well as the more advanced, important parts of the rendering pipeline that allowed us to develop 
an extremely efficient visualization program to visualize large particle data sets with tens of 
millions of atoms rendered real-time with a decent frame rate. This was possible by using a 
combination of billboards and instanced geometry shaders, both explained in chapter 11. We 
concluded the part with a final show-off gallery in section 11.3. With this, we have achieved the 
last goal d) from section 1.3. 



12.2 Discussion 



We have used two fundamentally different models to study flow in similar systems. MD is an 
atomic model computing forces between every atom whereas DSMC is a stochastic particle model 
based on statistical mechanics. Studying a physical problem this way has a great strength; when 
two models with different assumptions agree on the results, we have, to a greater extent, reasons 
to trust them than if it was just one model. But in any model, we make assumptions, and these 
may produce wrong answers. The main reason to use DSMC rather than MD is of course the 
reduced computational cost. This points out an important fact; the more physics you include 
in a model, the more computational expensive it is. This leads to the fundamental problem we 
always meet before choosing a model; what is the physical problem, and what are the relevant 
phenomena? 

A complete model of shale gas extraction should include physics describing both the gas pro- 
duction in the nanometer pores as well as the large scale flow from the fractures into the drilling 
hole. Today we do not have any good models covering this huge range in length scales and time 
scales. On the larger scale, we use continuum mechanics whereas the smaller scale requires mod- 
els like MD or DSMC. On the smallest scale there are several alternatives to these two. Other 
popular methods are the lattice Boltzmann (LB), dissipative particle dynamics (DPD), which 
both can simulate larger systems for lager times. But again, a faster model usually includes less 
physics. A comparison between DSMC and these models would be interesting. 

Both models we have studied have indeed shown promising results, calculating permeabilities 
in nanopores in agreement with the Knudsen correction, but there might be other important 
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effects we have not included in the models. Such effects are probably not incorporated in the 
Knudsen correction either, but could become evident in an experiment. Validation of a model 
is not necessarily a validation of the assumptions. For example, if the surfaces inside the pores 
have a non-zero net charge, it could significantly affect the fluid flow and the permeability. 
We can incorporate this in MD by using a more advanced potential. The model we used to 
create a solid in MD used a harmonic oscillator potential to keep the atoms at approximately 
the same position as they started, keeping the geometrical form of the solid. While this model 
includes realistic atomic forces and vibrations inside the solid, deformations and fracturing is 
now impossible since the atoms are forced in the original formation. 

It is also assumed that the gas is trapped both inside already existing pores as well as adsorbed 
onto the organic material inside the shales. The rate of desorption from the organic material may 
require additional models releasing the gas from the organic material to the fracture network. 
These are just a few, but important remarks about the limitations of the models. It is needless 
to say that they are not complete in any sense. Even if they were complete, if they calculated 
both gas production and flow from the smallest pores into the fracture network, we would not 
be even close to a complete model of a shale reservoir. The coupling between different length 
scales - multi scale physics - is still an unsolved problem in physical science. 

From the theoretical perspective, we used the Knudsen correction to predict permeabilities in 
geometries with a closed form solution for the absolute permeability at the no-slip scale. If we 
work with a geometry without a known absolute permeability, we can measure this in the high 
density limit, and still use the Knudsen correction to predict permeabilities for dilute gases. The 
correction factor is a function of the Knudsen number, which is a well defined quantity in simple 
geometries like a cylinder. However, for more complicated geometries like the packed spheres, 
the Knudsen number should rather be seen as a distribution than one single number. It is 
possible, as we discuss in appendix A, to calculate a distribution of Knudsen numbers providing 
both the mean value and the standard deviation which may be used to develop a stochastic 
Knudsen correction based on the original model. By obtaining statistical information about the 
geometry of a real shale reservoir, such a theory could in principle enable us to do an economical 
risk analysis based on the permeability distribution. 



12.3 Future work 



No matter how much work we do, new ideas pop our minds faster than we manage to complete 
the old ones. During the past months, through everything we have learned, both ideas about 
extensions to the models and applications have emerged. First of all, as we discussed in section 
4.7, our analysis of whether or not the system has reached a steady state is inadequate. New, 
automatic methods should be incorporated by, in several regions (for example by using the 
collision cells in DSMC), looking at expectation values and the variance of physical quantities 
such as momentum, energy and temperature. Even with a more advanced analysis of if the 
steady state is reached or not, reaching a steady state may take a long time. In the benchmark 
we performed and discussed in section 6.2, 512 processors used 545 seconds to proceed 500 
timesteps with one billion particles. If the system requires 100k timesteps to reach a steady 
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state, this would require 15k cpu hours just to reach the steady state. With DSMC, it may be 
possible to use a smaller number of particles (that is, increase the number of effective atoms 
per simulated particle) while the system reaches a steady state. Then, before the sampling, 
increase the total number of particles by distributing a number of particles with the velocities 
and densities computed from the previous simulation, equilibrating this state before starting the 
sampling. 

A detailed study of how the Cercignani-Lampis collision model from section 4.2 affects the 
permeability would also be a interesting and important work. Here one could for example also 
use the Lennard-Jones MD model to fire single atoms towards a surface and gather a distribution 
to compare with the Cercignani-Lampis model. 

With the voxelization model of the surface we developed in section 5.1 a few questions quickly 
arise. What are the effects of the discretization of the geometry? A real, smooth surface does 
indeed look different than a jagged surface composed by voxels. With this representation, the 
effective surface area is increased which could affect the flow. A detailed study of these effects 
should be carried out. Such a representation also requires a lot of memory as the memory 
requirement scales as 0(N S ). To save memory, we can use a sparse voxel octree representation. 
If a larger group of voxels all share the same value, only one, larger voxel is saved in memory, 
representing all the smaller voxels. 

Since the matrix already is on a discretized grid and the net momentum transfer from the 
particles is calculated during collisions, it is possible to extend the model to include deformation 
and cracking which are important processes inside the shales. Production from organic material 
can also be included by including a diffusion solver on the matrix that can desorb and adsorb 
gas particles. 

For a large number of processors, the amount of work each processor is assigned may not be 
equal to all the others. This means that a processor that is computing only half the work than 
another processor, it will spend 50% of its time waiting for the other processors to finish. The 
solution to this is known as load balancing. An analysis of the system prior to the simulation 
may distribute the amount of work not by dividing the system into equal total volumes, but equal 
pore volumes in which the particles are. The amount of work is proportional to the number of 
particles a processor has. A last, but very interesting task we would like to do is simulate gas 
through a nanoporous media from a real material. Such data can be obtained with a technique 
called focused ion beam scanning electron microscopy (FIB-SEM). 
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Knudsen number of a packed spheres 
system 



If we want to use the Knudsen correction factor (equation (2.19)) to predict permeabilities in a 
nanoporous system consisting of packed spheres, we need to derive some statistical properties 
of the system giving us enough information to calculate the expected Knudsen number. Now, 
consider a volume V consisting of N points (sphere centers) placed randomly and independently 
in the system. Note that this allows spheres with radius r to overlap. The density of points is 
of course given by 

n=y- (A.l) 

The probability of placing a sphere center in a volume v is v/V, yielding the probability of not 
placing a sphere center in that same volume (1 — v/V). If we now place N such points, the 
probability Pq{N) of not having placed any points in that volume is given by 

N 

(A.2) 

where we have used that v/V = nv/nV = nv/N using equation (A.l). In the limit where 
N — > oo and V — > oo, keeping the density n and v constant, the probability approaches 

N 

= exp(-nv). (A. 3) 

We can use this to compute the probability of finding no sphere centers within a distance I from 
a point 

P 0 (/) = exp(- 4 | r / 3 ), (A.4) 

where we just used the volume of a sphere v = 4/37rZ 3 . This is the cumulative probability distri- 
bution of finding no sphere centers within a distance I, so the inverse problem, the probability 
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of finding at least one sphere center at some distance x < I is now easy to calculate 

P(x <l) = l-exp(-^A . (A.5) 



This is also a cumulative distribution function which we can differentiate to find the probability 
distribution of distances I 

p(l) = innl 2 exp (- ^Aj • (A.6) 

p(l)dl is the probability of finding a sphere center within the range [I, l + dl). In this calculation, 
we are only interested in the distribution of distances I given that we are not inside any spheres. 
So we define a new distribution function 

/in f 0 if I < r, / a r,\ 

9 » = Up(0 in>r! ^ ^ 

where M is the normalization constant for q(l) (the area must be less than one now since we 
removed all the points from zero to r). We find M by integrating 

oo oo 

1 = M j q{l)dl = M J p(l)dl (A.8) 

0 r 

oo 

= M4irn [ I 2 exp I -^A dl (A.9) 



= Mexp^-^r 3 j, (A.10) 

which gives M = exp (^p-r 3 ) ■ Avoiding the points being inside the spheres is of course the same 
as choosing only the points that are in the pore space. The probability of randomly choosing 
such a point is actually the porosity (ft (remember, the porosity is pore space volume divided by 
total volume). It is found by using equation (A.4) 

0 = exp(-^V), (A.ll) 

which we recognize as M~ l . We can then rewrite q(l) 

m _ J 0 if I < r, ,. , 

q[L) ~ \ Anircft-H 2 exp (-^/ 3 ) if I > r. {AAZ) 

Now it would be interesting to find the average distance to the sphere centers. It is found by 
calculating 

oo 



4 "7M 4n " 3 



(/) = / l q (l)dl = — / Pexp —I* dl (A.13) 
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where T(a, x) is the incomplete gamma function defined as 

oo 

T(a,x) = J t 0 -- 1 exp(-t)dt. 



(A.15) 



We can use equation (A. 11) to replace the dependency of the number density n with the porosity 
(f) and sphere radius r by solving for n 



n 



47rr 3 



In. 



c-l 



and then insert this into equation (A. 14) to obtain 



(/(r,0)) = -[ln0- 1 ] srj-Jn^ 1 



(A.16) 



(A.17) 



What we really want is the distances to the surfaces of the spheres, d = (I —r) which is easy to 
find 



(d(r,4>)} = (l(r, ( / ) )}-r 



= r 



i-i 



[ln^ 1 
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(A.18) 
(A.19) 



In the expression for the Knudsen number (Kn = X/L), we will use this quantity instead of L 
as our best estimate, or average value, of the channel diameter. We then define the estimated 
Knudsen number Kn(r, 0)* as 



Kn(r, 4>Y 
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The Liouville operator and time 
integrators 



The derivation of time integrator schemes are usually done in a mathematical sense using Taylor 
expansions. Given a Taylor expansion f{x + h) = f{x) + hf'(x) + h 2 /2f"(x) + we often 
truncate at some term, yielding a truncation error 0(h n ). In physics, there are other properties 
of a system of which the error doesn't scale as the truncation error. A typical example is the 
energy of a particle system. Two different time integrators may have the same truncation error, 
but have very different long term behavior if one has a drift in the energy whereas the other 
doesn't. 

There are other properties that might be difficult to measure through a mathematical analysis 
of the Taylor expansion that may be more important than the truncation error itself. In the 
Hamiltonian formulation of classical mechanics, we obtain the equations of motion through the 
energy operator H = T + V, where T and V are operators measuring the kinetic and poten- 
tial energy. Using the physical description of a system can help developing better integration 
schemes. 

In this chapter we will address the equations of motion by using the Liouville operator to derive 
a way to create time integration schemes. We start by looking at the phase space coordinates 
and define the Liouville operator in section B.l. We then define the time evolution operator 
and split the Liouville operator into two operators; one operator acting on the positions and 
one acting on the momenta. These operators do not commute, so we use the Trotter identity 
to introduce time discretization and derive the Velocity Verlet algorithm, which is the one used 
in the Molecular Dynamics code. We then do an analysis of the mathematical error to find the 
local error (which is the same as the truncation error) in section B.3. This derivation is done as 
in [11]. 
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B.l Liouville operator 

The physical system consists of N particles, each having three positions and three momenta 
defining the phase space point (r, p). Now assume some function of these variables f(r(t),p(t)) = 
f(t) (the function is indirectly a function of time) that has the time derivative 

A.) -*«M + ^j?> =*/<<), (B.i) 

where we have defined the Liouville operator 

(B.2) 

This allows us to define the time evolution operator U{t) 

f(t)=U(t)f(0) = e itt f(0), (B.3) 

which is easily verified 

f(t) = it [e ltt f(0)] = iL/(t). (B.4) 
If we now split the Liouville operator into two parts 

iti = iL r + ilLp, (B.5) 

so that 

it r = r(0)^. (B.6) 
Let us see what this operator can do if we insert it into equation (B.3) and expand the exponential 

f{t) = exp (iL r t) /(0) (B.7) 

= ^P (rt^j /(0) (B.8) 

n=0 

= /[(r(0) + r(0)t),p(0)]. (B.10) 

It does just what we expect it to do, it is a displacement operator, moving the points in the 
phase space according to their time derivative. The momentum Liouville operator does of course 
exactly the same, so that by applying the total time evolution operator, we do indeed get 

f(t) = e^/(0) = / [(r(0) + r(O)t) , (p(0) + p(O)t)] . (B.ll) 

If we were to use this in a simulation, we normally do not apply the full operator at the same 
time, we might first treat the positions, then the momenta. So ideally, we would want to first 
apply one operator, then the next one 

e iL = e iL p +iL r ^ ^v e ^r^ (B.12) 
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but this is not the case since the operators do necessarily commute. However, for two operators 
A and B, we can use the Trotter identity [11] 



e A+B = lim ( e A/2M e B/M e A/2M\ 
JV-s-oo V / 



N 



which can be truncated 



e A+B = ( e A/2N e B/N e A/2N ) N e ° (1/iv2) . 



(B.13) 



(B.14) 



This can be used to derive different time integrator schemes. We will now derive the Velocity 
Verlet scheme which is used in the Molecular Dynamics code. 



B.2 Derivation of the Velocity Verlet algorithm 



The truncated Trotter identity is neat, we can replace A with the iL p and B with iL r 

M K ' dr 

defining the timestep At = t/N. The truncated time evolution operator now reads 



U(t) 



e iLpAt/2 e iL r At e iL p At/2 



N 



where we identify one timestep iteration 



U(At) = e ilj pAt/2 e iL, r At e iL,pAt/2 



If we now apply this operator on /(0), we first apply the rightmost operator 

e^ At / 2 /[p(0),r(0)] = /|r(0), 
before applying exp(iL r t) 

e^/{r(0), 
= /|[r(0) + Atr(At/2)], 
Our last operator gives the final result 

/ j[r(0) + Atr(At/2)] , [p(0) + ^p(O) + ^p(At) 



At 

P (0) + -y P (0) 



At 

p(0) + — p(0) 
P (0) + ^p(O) 



(B.15) 
(B.16) 

(B.17) 
(B.18) 

(B.19) 



(B.20) 
(B.21) 

(B.22) 
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These steps can be summarized as one usually does with time integrators 

v(At/2)=v(0) + ^^ (B.23) 
r(At) = r(0) + v(A*/2)A* (B.24) 

v(At) =v(At/2) + F ^ — , (B.25) 
m 2 

where we have replaced p with the equivalent (F/m) and r with v which is valid if the forces 
can be calculated from the position. These steps are called the Velocity Verlet algorithm and 
has, as we now will see, an error 0(Ai 3 ). 



B.3 Truncation error 



During one timestep iteration, we approximate the Liouville operator 

e iLAt ^ e itpAt/2 e it r At e it p At/2 _ giLAt+e 26) 

where we have introduced the error operator e that represents our truncation error. These 
are linear operators on which we can use the Campbell-Baker-Hausdorff expansion of general, 
non-commuting linear operators 

g AA e AB = e AA+AB+^[A,B] + f|[A,[A,B]] + f|[B,[A,B]]+... (B.27) 

together with 

e A e B = e B+[A,B]+i[A,[A,B]] + J T [A,[A,[A,B]]]+... e A ^ 2 8) 

to find the leading term in e 

- (At) 3 (JfiU, [it r , it p ]] + ^ \it p , \itr, it p ]]j , (B.29) 

where we see that At 3 is the global truncation error in the Velocity Verlet algorithm after n 
timesteps. The local truncation error is then At 4 . 



Appendix C 

Derivation of pressure in Molecular 
Dynamics 



A substance in a Molecular Dynamics simulation does not generally satisfy the ideal gas equation 
of state. The pressure has the general form 



oo 



P = k B T\p™B m {T), (C.l) 



771=1 



where the functions B m (t) are called the virial coefficients with B\(T) = 1, yielding the ideal gas 
pressure [23]. We will now derive an expression for the pressure of this form by using Clausius' 
virial function. Assume that we have the positions of all atoms, r, and define 



N 



W(r) = £r n .F™ T , (C.2) 

n=l 

where F^ OT is the total force acting on atom n, including external forces. We assume equilib- 
rium, so that the kinetic energy has reached an approximately constant value (it will of course 
fluctuate with standard deviation l/y/N as usual). We measure the statistical average of W by 
computing (using F = ma. = rar) 

i r* N 

(W) = Urn - / dr Vm n r n (r)-r n (r). (C.3) 
t Jo ^1 

Integrating by parts gives 

N 



(W) 



lim \ [ dr Vmilr^r)! 2 = -2{E k ) = -3Nk b T, (C.4) 



by using equipartition. Now, assume that the atoms live inside a parallelepipedic container of 
size L x , L y , L z with hard walls (they don't move) and origo in one of its corners. If we divide the 
force into external and interatomic forces, F^ OT = F n + Fj^ XT , and assume that the external 
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forces are forces from the container (no gravity or electric fields), we can calculate W EXT . The 
atoms near the walls apply a pressure on the wall P = F/A. As an example, we look at all the 
atoms that are near the wall located at x = L x . The virial function gives 



A r .x 



n=l 

where n now sums over all atoms that are near the container wall at x = L x . The position vectors 
are r n = (L x , y n , z n ) (for different y n and z n ) and the force has only a component normal to the 
wall F^ XT = l/N x (-PL y L z , 0,0). We then get 

Wf XT = —L x PL y L z = -PV, (C.6) 

and by doing the same for the other dimensions, we get 

W = -3PV. (C.7) 

Inserting this result into (C.4) yields 

^r n -F n | -3PV = -3Nk b T (C.8) 

which can be rearranged to 

PV = Nk b T-±(^rn-F n ). (C.9) 
Using this result, we can define the pressure 

1 



\n=l 



P = PnhT- w 



I N \ 

J> n -F n \, (CIO) 

\n=l I 



where p n is the number density. 
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Workflow and Python scripting 



When we are running simulations to study flow in nanoporous media, one single experiment 
consists of more than just starting the program and waiting for the result. A typical simulation 
scheme could be 

1. Prepare geometry 

2. Initialize system 

3. Reach a steady state 

4. Sample statistics. 

Some of these steps aren't even performed by the same C++ program, so we have developed a 
scripting framework in Python wrapping all the functionality from the different C++ programs 
into one scripting framework. This simplifies the workflow all the way from the very idea of 
some physical problem to running the simulation to study it. In this appendix we will discuss 
the Python framework and show the workflow of how to study problem, from idea to simulation, 
using DSMC. First, we quickly explain the basic principles of the Python framework. 



D.l The Python framework 

The framework is a Python class with a lot of convenience methods built-in. Its main job is to 
compile the C++ programs and simplify the generation of config files that are used by the C++ 
DSMC program. The class is called DSMC and is best explained by a code example, see listing 
D.l. This allows us to quickly create a script running a full simulation consisting on several 
steps in a short amount of time. 



class DSMC: 

def init (self , compiler = "mpic++" , dt=0.001, nx = l, ny = l, nz = l) : 
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# Physical parameters 

s e I f . u n i t _ co n ve rte r = DS M C_ u n it _con verter ( s e I f ) 

s e I f . tern peratu re = u n i t _ co n ve rt e r . te m pe ra t u re_f rom _si (T=300) 

s e I f . d e n s i t y = u n i t _ co n ve rt e r . de nsi ty _ f ro m _ si ( r ho = 1 .0 e25 ) 

self.diam = u n i t _ co n ve rt e r . I e ngt h _ f rom _ si ( 3 . 6 2 e — 10) 

self. mass = u n i t _ co n ve rt e r . mass_f rom _si ( 6 . 63 e — 26) 



s e I f . dt = dt 

s e I f . co m p i I e r = compiler 
s e I f . nx = nx 
s e I f . ny = ny 
self.nz = nz 



self . timesteps = 10000 

s e I f . s u rf a c e _ i n t e r a c t i o n = "thermal" 

# ■■■ 



def sa ve _ sta te ( se I f , path): 

# ■■■ 

def load_state(self, path): 

# ... 

def prepa re_ new_system ( s e I f ) : 

# ••• 

def a p p I y _ p ressu re _ gra d i e n t _ pe rce n t a ge ( s e I f , f a ct or _ of _ i d ea I _ ga s _ p ress u re 

): 

# ... 



def c re a t e _ co n f i g _ f i I e ( s e I f ) : 

o r i g i n a I _ f i I e = open (" program/000 _conf ig_f iles /dsmc . ini ,' r ') ; 
output_file = open (' dsmc . ini ',' w ') ; 



for line in or iginal_f ile : 

line = line . replace (' __dt__ ', str ( self . dt ) ) 

line = line . replace (' __dens it y__ ', str ( self . density ) ) 

# ... 

output_file . writelines (line) 



def compile ( self ) : 
# ... 



def run ( self ) : 
# ... 

} 



Listing D.l: dsmcconfig.py. This program illustrates how the framework script is created. A 
lot of convenience functions are built-in so that running a simulation is a simple process. The 

create_config_file() function replaces the string value in a dummy config file with the 

real value stored in the class. 
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D.2 From idea to simulation 

Create the geometry 

The DSMC program package we have developed is an excellent tool to study flow of dilute 
gases in complex nanoporous media. So what we usually do is to come up with the idea 
of some interesting geometry to study. This could either be a real sample dataset, a 
mathematical description or some other creative way to create the geometry. An example 
could be the packed spheres discussed in section 6.4 where each sphere is placed out 
randomly inside the volume defined by the N x x N y x N z voxel grid. All the voxels within 
a given distance (the sphere radius) from the sphere center are marked as solid voxels. Any 
other geometry can made with this technique with the code in the complexgeometry.cpp 
file. 

Initialize system 

Once we have the geometry of the system we would like to study, we need to insert gas 
particles inside the system. The gas could in principle have different densities based on 
position or other properties, but in this thesis we are placing the particles uniformly in 
the pore volume, i.e. in voxels marked as empty. We choose velocities from the Maxwell- 
Boltzmann distribution (equation (3.45)) so that the gas has the temperature we want. 

Reach a steady state 

Now that we have our gas particles inside the (possibly) complex geometry, we want to 
apply a pressure gradient inducing flow. In the beginning, all the particles are (on average) 
at rest, so we should wait until we have reached a steady state before we start sampling 
statistics. Methods to determine whether or not a system has reached a steady state was 
discussed in section 4.7. 

Sample statistics 

If we assume that the system finally has reached a steady state, we can start measuring 
the physical quantities discussed in section 4.4. The main focus in this thesis has been to 
study the permeability for different geometries. 



D.3 The run script - example. py 

Finally, after having defined all the steps in our scheme, we can gather everything into one single 
Python script. This example script shows how to perform all the steps discussed above. 



from dsmcconfig import * 

from dsmc _ u n i t con verter import * 

from dsmc_geometry_config import * 

nx = 2 # Number of processors in x— direction 
ny = 2 # Number of processors in y— direction 
nz = 2 # Number of processors in z— direction 
program = DSMC( dt =0.001 , nx=nx , ny=ny , nz=nz) 
dsmc = progra m . co m p i I e ( name=" main " ) 
geometry = DSMC_geometry ( progra m ) 
u n i t _ co n ve rt e r = DSMC_unit_converter ( program ) 
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# Save geometry files to this directory 

geometry. binary_output_folder = "../ geometries / example_geometry / " 
program . world = geomet ry . b i n a ry _ o u t p u t _ f ol d e r 

# Set the dimensionality of the voxel grid 
geometry . num_voxels_x = 128 

geometry . num_voxels_y = 128 
geometry . num_voxels_z = 128 

#geometry . create _ cylinder s ( radius = 0 . 05 , num_ cylinders_ per _ dimension = 4) 
geomet ry . c rea te _ pa c ked _ s p h e res ( r a d i u s = 0.02, spheres_type = 1, 
wa n ted _ po rosi ty = 0.5) 

# Specify the array size (maximum number of particles) 
progra m . max_ molecu les_ per_ node = 1.5e6 

# Set system size ( in micro meters ) 
progra m . Lx = 1 

progra m . Ly = 1 
program . Lz = 1 

# Select density (Kn = mean_free_path / length) 

progra m . d e n s i ty = u n i t _ co n ve rte r . density _from _ kn udsen _ n u m ber ( 
kn udsen _ n u m ber = 1 .0 , I e n gt h=progra m . Ly) 

# Prepare a new simulation , delete old files 
progra m. reset () 

# Select how many atoms each particle represents 
progra m . atoms_ per_ molecu le = 1 

# Set the number of collision cells 

program.set_number_of_cells(geometry , particles_per_cell=100) 

# Initialize gas particles 
program . prepa re_ new_system () 
program . run (dsmc) 

# Apply pressure gradient 

program.apply_pressure_gradient_percentage(factor=2.0) 

# Select low statistics measuring interval 
progra m . s t a t i s t i c s _ i n t e r v a I = 10000 

# Select num timesteps to reach steady state 
program . ti mesteps = 500000 

program . create_config_file () 
program . run (dsmc) 

# Select high statistics measuring interval 
progra m . s t a t i s t i c s _ i n t e r v a I = 100 

# Select num timesteps to sample 
progra m . t i m est e ps = 100000 

program . create_config_file () 
program . run (dsmc) 
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Listing D.2: example. py 



Appendix E 



Physical units 



The choice of units does of course not change the physical behavior. It is only a scaling of the the 
physical quantities so that their numerical values become more convenient. For example, it is 
common to choose a unit of length Lq so that one unit of length is a typical value in the physical 
system. There are however only a few free choices of units of which the other units follow from. 
In this appendix, we discuss the choices of units in both the MD and DSMC simulations and 
derive the other units. 



Here we use the so-called MD units. A convenient consequence of these units are that the 
parameters and masses in the Lennard-Jones force can be factored out which gives simpler 
calculations. The units we choose are 



E.l Choice of units in MD 



Temperature T 0 = 119.6 K 



Length L 0 = 3.405 x l(T iU m, 
Mass m 0 = 1.66 x 1(T 27 kg, 
Energy E 0 = 1.65 x 1(T 21 J, 



(E.l) 
(E.2) 
(E.3) 
(E.4) 



175 



176 Physical units 



Chapter E 



E.2 Choice of units in DSMC 

In DSMC, we use the same initial units as in MD, but with another unit of length since the 
systems normally are a few orders of magnitude larger. Here we use 

Length L 0 = 1.0 x 1(T 6 m, (E.5) 

Mass m 0 = 1.66053886 x 10~ 27 kg, (E.6) 

Energy E 0 = 1.65088 x 10~ 21 J, (E.7) 

Temperature T 0 = 119.6 K (E.8) 

Note that the mass, energy and the Boltzmann constant are equal in both MD and DSMC. 



E.3 Derivation of the other units 



The other units can be derived from the four basis units through relations like E = mc 2 and 
P = F I A. Together, these physical formulas form a set of equations that can be solved for each 
physical quantity. We find the time unit to through E = mc 2 

L 2 

E0 = m 0 -^ (E.9) 

(E.10) 




The force unit F$ is found by using Newton's second law 

Fo = m p = ^ (E.ll) 

We find the temperature To by using that we chose the Boltzmann constant to be 1.0, which 
gives 

T 0 = ^. (E.12) 
Kb 

The pressure is found through P = F/A 

Now we have all all the conversion factors between SI units and MD/DSMC units. The programs 
are written so that all input values and internal variables are in the MD/DSMC units, but we 
have a simple unit converter script that can transform any physical value both to and from SI 
units. The conversion script can be found in listing E.l. 



from dsmcconfig import * 
from math import sqrt , pi 
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class DS M C_ u nit _converter : 

def init (self, dsmc): 

self. dsmc = dsmc 
self.mO= 1.66053886e-27 # si 
self . LO = le-6 # si 

self . EO = 1.65088e-21 # si 

self.kb = 1.3806503e-23 # si 

self. tO = self . L0*sqrt ( self .mO/self . EO) 

self.FO = self.EO/self.LO 

self .TO = se If . EO/ self . kb 

self.PO = se If .m0/( self . t0**2* self . LO) 

self.vO = self. LO/ se I f . tO 

self.aO = self.vO/self.tO 

self.viscO = se I f . PO* se I f . tO 

self.diffO = self . L0**2/ self . tO 

self.permO = self.L0**2 

s e I f . n u m ber_ density 0 = 1 . 0 / ( s e I f . LO * *3) 

def p ress u re _ to _ s i ( s e I f , P) : 
return P* self.PO 

def p ress u re _ f ro m _ si ( se I f , P) : 
return P/ se If . PO 

def tern pe ra t u re _ to _ si ( s e I f , T) : 
return T* self. TO 

def tern pe ra t u re_ f rom _ si ( s e I f , T) : 
return T/ self .TO 

# All the other physical quantities can be calculated like this 



Listing E.l: dsmcUnitConverter.py 
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